How to convert TensorFlow checkpoint files to TensorFlowJS?

How to convert TensorFlow checkpoint files to TensorFlowJS? - python

I have a project that was developed on TensorFlow v1 I think. It works in Python 3.8 like this:
...
saver = tf.train.Saver(var_list=vars)
...
saver.restore(self.sess, tf.train.latest_checkpoint(checkpoint_dir))
...
The checkpoint files reside in the "checkpoint_dir"
I would like to use this with TFjs but I can't figure out how to transform the checkpoint files to something that can be loaded with TFjs.
What should I do?
thanks,
John

Ok, I figured it out. Hope this helps other beginners like me too.
The checkpoint files do not contain the model, they only contain the values (weights, etc) of the model.
The model is actually built in the code. So, here are the steps to convert the Tensorflow v1 checkpoint files to TensorflowJS loadable model:
First I saved the checkpoint again because there was a file that was missing (.meta file) This contains some meta information about the values in the checkpoint. To save the checkpoint with meta I used this code right after the saver.restore(... call like this:
...
saver.save(self.sess,save_path='./newcheckpoint/')
...
Save the model as a frozen model file like this:
import tensorflow.compat.v1 as tf
meta_path = './newcheckpoint/.meta' # Your .meta file
output_node_names = ['name_of_the_output_node'] # Output nodes
with tf.Session() as sess:
# Restore the graph
saver = tf.train.import_meta_graph(meta_path)
# Load weights
saver.restore(sess,tf.train.latest_checkpoint('./newcheckpoint/'))
# Freeze the graph
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
# Save the frozen graph
with open('./freeze/output_graph.pb', 'wb') as f:
f.write(frozen_graph_def.SerializeToString())
This will save the model to ./freeze/output_graph.pb
Using tensorflowjs_converter convert the frozen model to a web model like this:
tensorflowjs_converter --input_format=tf_frozen_model --output_node_names='final_add' --skip_op_check ./freeze/output_graph.pb ./web_model/
Had to use the --skip_op_check due to some missing op errors/warnings when trying to convert.
As a result of step 3, the ./webmodel/ folder will contain the JSON and binary files required by the TensorflowJS library.
Here's how I load the model using tfjs 2.x:
model=await tf.loadGraphModel('web_model/model.json');

Related

Extract labels from tflite model file

I have a trained TF-Lite model (model.tflite) for image classification with several labels. The output of the model provides an array of probabilities, but I don't know the order to the labels.
Can I extract the labels from the TF model?

I think this might extract the metadata
pip install tflite_support
import os
from tflite_support import metadata as _metadata
from tflite_support import metadata_schema_py_generated as _metadata_fb
model_file = <model_path>
displayer = _metadata.MetadataDisplayer.with_model_file(model_file)
export_json_file = os.path.join(os.path.splitext(model_file)[0] + ".json")
json_file = displayer.get_metadata_json()
with open(export_json_file, "w") as f:
f.write(json_file)

The simplest thing to do is to dump the labels file from the TF Lite model file. That file is a zip archive, so just do this:
unzip mobilenet_v1_0.75_160_quantized_1_metadata_1.tflite
Archive: mobilenet_v1_0.75_160_quantized_1_metadata_1.tflite
extracting: labels.txt
The "labels.txt" file (or something similarly named) contains the list of labels for the model.
Reference (and more info on how to read TF Lite model metadata): https://www.tensorflow.org/lite/models/convert/metadata#read_the_associated_files_from_models
Note: A TF Lite model is not guaranteed to contain a labels file like this, but most publicly published models, such as ones on tfhub.dev, should have this metadata included.

How do I convert a .meta .index and .data file into SavedModel (.pb) format without losing metagraphdef?

I'm trying to convert these three files of a pre-trained model:
semantic_model.data-00000-of-00001
semantic_model.index
semantic_model.meta
into a Saved Model format, so that I can later convert it into TFLite format for Inference.
Searching StackOverflow, I'd come across this code, which properly generates the Saved_model.pb, however as noted in some comments, doing it in this way doesn't keep the Meta Graph Definitions, which causes an error when I later try to convert it into TFlite format or freeze it.
import os
import tensorflow.compat.v1 as tf
tf.compat.v1.disable_eager_execution()
export_dir = '/tf-end-to-end/export_dir'
#trained_checkpoint_prefix = 'Models/semantic_model' \tf-end-to-end\Models
trained_checkpoint_prefix = 'PATH TO MODEL DIRECTORY'
tf.reset_default_graph()
graph = tf.Graph()
loader = tf.train.import_meta_graph(trained_checkpoint_prefix + ".meta" )
sess = tf.Session()
loader.restore(sess,trained_checkpoint_prefix)
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.TRAINING, tf.saved_model.tag_constants.SERVING], strip_default_attrs=True)
builder.save()
This is the error I get when trying to use the saved_model:
RuntTimeError: MetaGraphDef associated with tags {'serve'} could not be found in SavedModel
Running the showsavedmodelcli --all doesn't display anything under signature definitions for the created saved_model.
My question is, how do I maintain the data and convert this to saved_model, for later conversion into TFLite format?
Model Structure and creation details can be seen here, including the checkpoint files mentioned: https://github.com/OMR-Research/tf-end-to-end

Refer to these steps for converting checkpoints to a TFLite model: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/r1/convert/python_api.md#convert-checkpoints-

Is saved_model.pb from keras.models.save_model the same with tensorflow freeze_graph output .pb file?

After a model is trained in keras, I used to apply tf.compat.v1.graph_util.convert_variables_to_constants or freeze_graph.py to freeze model and output .pb file. Like this:
output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(sess, input_graph_def, output_node_names)
with tf.gfile.GFile('model.pb', "wb") as f:
f.write(output_graph_def.SerializeToString())
Recently, I find tf.compat.v1.graph_util.convert_variables_to_constants is labeled with: Warning: THIS FUNCTION IS DEPRECATED.
So I'm looking for a updated method of generate .pb file. I find this: keras.models.save_model() to save model and output dir contains:
assets saved_model.pb variables
I'm not sure if this saved_model.pb is the same with output .pb file of tf.compat.v1.graph_util.convert_variables_to_constants?
If not, could someone recommed a better way to get frozen model (.pb) file?
Thanks.

I have confirmed they are not the same .pb file. If you load and run SavedModel format, you'll get error: Data loss: Can't parse testmodel/saved_model.pb as binary proto

Restore intermediate checkpoint files of Tensorflow

Tensorflow version =1.8.0
I am trying to restore my model using one of the intermediate checkpoint files in Tensorflow. By default Tensorflow will take the last saved checkpoint file.
For example, the folder contains files like:
checkpoint
model-56000.index model-56000.data-00000-of-00001 model-56000.meta model-57000.index model-57000.data-00000-of-00001 model-57000.meta
By default, Tensorflow loads the last 57K checkpoint, but for reasons, I want to load the weights for the 56K checkpoint.
Following is my code for restoring the model:
def load_G(self, checkpoint_dir):
print(" [*] Reading checkpoints of G...")
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
self.saver_gen.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
return True
else:
return False
From Tensorflow's page, I read that for tf.train.get_checkpoint_state(), I can specify tf.train.get_checkpoint_state(checkpoint_dir, latest_filename=None). But I am not able to figure, what should I write for latest_filename. I tried writing latest_filename = model-56000
But that did not load the model.
I also tried writing latest_filename = model-56000.meta. That also did not work.
So, what is the correct way to load some intermediate checkpoint files in Tensorflow.

Ok,so a hack is to modify the checkpoint protobuf file and change the first line of that file from: model_checkpoint_path: "model-57000" to model_checkpoint_path: "model-56000" ad now it loads the 56K checkpoint.
Looking for some better ways to do this.

The ckpt file name will be model-56000.ckpt
model-56000.meta points to the meta information of the ckpt
model-56000 is file name for either ckpt, data files, or meta files

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work. It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions.
Equivalent to my code:
import gensim
corpus = [["words","in","sentence","one"],["words","in","sentence","two"]]
model = gensim.models.Word2Vec(iter = 5,size = 64)
model.build_vocab(corpus)
# save memory
vectors = model.wv
del model
vectors.save_word2vec_format("vect.txt",binary = False)
That creates the model, saves the vectors, and then prints the results out nice and pretty in a tab delimited file with values for all of the dimensions. I understand how to do what I'm doing, I just can't figure out what's wrong with the way I put it in tensorflow, as the documentation regarding that is pretty scarce as far as I can tell.
One idea that has been presented to me is implementing the appropriate tensorflow code, but I don’t know how to code that, just import files in the live demo.
Edit: I have a new problem now. The object I have my vectors in is non-iterable because gensim apparently decided to make its own data structures that are non-compatible with what I'm trying to do.
Ok. Done with that too! Thanks for your help!

What you are describing is possible. What you have to keep in mind is that Tensorboard reads from saved tensorflow binaries which represent your variables on disk.
More information on saving and restoring tensorflow graph and variables here
The main task is therefore to get the embeddings as saved tf variables.
Assumptions:
in the following code embeddings is a python dict {word:np.array (np.shape==[embedding_size])}
python version is 3.5+
used libraries are numpy as np, tensorflow as tf
the directory to store the tf variables is model_dir/
Step 1: Stack the embeddings to get a single np.array
embeddings_vectors = np.stack(list(embeddings.values(), axis=0))
# shape [n_words, embedding_size]
Step 2: Save the tf.Variable on disk
# Create some variables.
emb = tf.Variable(embeddings_vectors, name='word_embeddings')
# Add an op to initialize the variable.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables and save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Save the variables to disk.
save_path = saver.save(sess, "model_dir/model.ckpt")
print("Model saved in path: %s" % save_path)
model_dir should contain files checkpoint, model.ckpt-1.data-00000-of-00001, model.ckpt-1.index, model.ckpt-1.meta
Step 3: Generate a metadata.tsv
To have a beautiful labeled cloud of embeddings, you can provide tensorboard with metadata as Tab-Separated Values (tsv) (cf. here).
words = '\n'.join(list(embeddings.keys()))
with open(os.path.join('model_dir', 'metadata.tsv'), 'w') as f:
f.write(words)
# .tsv file written in model_dir/metadata.tsv
Step 4: Visualize
Run $ tensorboard --logdir model_dir -> Projector.
To load metadata, the magic happens here:
As a reminder, some word2vec embedding projections are also available on http://projector.tensorflow.org/

Gensim actually has the official way to do this.
Documentation about it

The above answers didn't work for me. What I found out pretty useful was this script (will be added to gensim in the future) Source
To transform the data to metadata:
model = gensim.models.Word2Vec.load_word2vec_format(model_path, binary=True)
with open( tensorsfp, 'w+') as tensors:
with open( metadatafp, 'w+') as metadata:
for word in model.index2word:
encoded=word.encode('utf-8')
metadata.write(encoded + '\n')
vector_row = '\t'.join(map(str, model[word]))
tensors.write(vector_row + '\n')
Or follow this gist

the gemsim provide convert method word2vec to tf projector file
python -m gensim.scripts.word2vec2tensor -i ~w2v_model_file -o output_folder
add in projector wesite, upload the metadata

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert TensorFlow checkpoint files to TensorFlowJS? - python

Related

Extract labels from tflite model file

How do I convert a .meta .index and .data file into SavedModel (.pb) format without losing metagraphdef?

Is saved_model.pb from keras.models.save_model the same with tensorflow freeze_graph output .pb file?

Restore intermediate checkpoint files of Tensorflow

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

Categories

Resources