Is there a way to pickle a custom tensorflow.keras metric?

Is there a way to pickle a custom tensorflow.keras metric? - python

I defined the following custom metric to train my model in tensorflow:
import tensorflow as tf
from tensorflow import keras as ks
N_CLASSES = 15
class MulticlassMeanIoU(tf.keras.metrics.MeanIoU):
def __init__(self,
y_true = None,
y_pred = None,
num_classes = None,
name = "Multi_MeanIoU",
dtype = None):
super(MulticlassMeanIoU, self).__init__(num_classes = num_classes,
name = name, dtype = dtype)
self.__name__ = name
def get_config(self):
base_config = super().get_config()
return {**base_config, "num_classes": self.num_classes}
def update_state(self, y_true, y_pred, sample_weight = None):
y_pred = tf.math.argmax(y_pred, axis = -1)
return super().update_state(y_true, y_pred, sample_weight)
met = MulticlassMeanIoU(num_classes = N_CLASSES)
After training the model, I save the model and I also tried to save the custom object as follows:
with open("/some/path/custom_metrics.pkl", "wb") as f:
pickle.dump(met, f)
However, when I try to load the metric like this:
with open(path_custom_metrics, "rb") as f:
met = pickle.load(f)
I always get some errors, e.g. AttributeError: 'MulticlassMeanIoU' object has no attribute 'update_state_fn'.
Now I wonder whether it is possible to pickle a custom metric at all and if so, how? It would come in handy if I could save custom metrics with the model, so when I load the model in another Python session, I always have the metric which is required to load the model in the first place. It would be possible to define the metric anew through inserting the full code to the other script before loading the model, however, I think this would be bad style and could cause problems in case I would change something about the metric in the training script and forget to copy the code to the other script.

If you need to pickle a metric, one possible solution is to use __getstate__() and __setstate__() methods. During the (de)serialization process, these two methods are called, if they are available. Add these methods to your code and you will have what you need. I tried to make it as general as possible, so that it works for any Metric:
def __getstate__(self):
variables = {v.name: v.numpy() for v in self.variables}
state = {
name: variables[var.name]
for name, var in self._unconditional_dependency_names.items()
if isinstance(var, tf.Variable)}
state['name'] = self.name
state['num_classes'] = self.num_classes
return state
def __setstate__(self, state: Dict[str, Any]):
self.__init__(name=state.pop('name'), num_classes=state.pop('num_classes'))
for name, value in state.items():
self._unconditional_dependency_names[name].assign(value)

Related

Change the imported libraries of a saved pickle

Some time ago I saved a class function into a pickle file.
<library.module.Class at 0x1c926b2e520>
That class imports two libraries that changed their name meanwhile.
Is it possible to edit the pickle file so I can update those 2 new imports without regenerating the pickle all over again?
Thank you!
Regards
EDIT:
This is how I am loading the pickle:
import pickle
model_path = os.getenv('MODELS_FOLDER') + 'model_20210130.pkl'
model = pickle.load(open(model, 'rb'))
This of the pickle class content. The two libraries I want to update are pinpointed.
**import socceraction.spadl.config** as spadlconfig
**from socceraction.spadl.base import** SPADLSchema
class ExpectedThreat:
"""An implementation of the model.
"""
def __init__(self):
...
def __solve(
self) -> None:
def fit(self, actions: DataFrame[SPADLSchema]) -> 'ExpectedThreat':
"""Fits the xT model with the given actions."""
def predict(
self, actions: DataFrame[SPADLSchema], use_interpolation: bool = False
) -> np.ndarray:
"""Predicts the model values for the given actions.

I don't think you can do that.
Pickled objects are serialized content, and in order to be modified, it must be de-serialized properly.
Why can't you just load it, change it, and overwrite it?
You can still overwrite functions like properties with new ones:
# import new libraries
import new_socceraction.spadl.config as new_spadlconfig
from new_socceraction.spadl.base import new_SPADLSchema
import pickle
model_path = os.getenv('MODELS_FOLDER') + 'model_20210130.pkl'
with open(model_path, 'rb') as file:
model = pickle.load(file)
# define new methods with different libraries
def fit(self, actions: DataFrame[new_SPADLSchema]) -> 'ExpectedThreat':
"""Fits the xT model with the given actions."""
pass # your new implementation
def predict(
self, actions: DataFrame[new_SPADLSchema], use_interpolation: bool = False
) -> np.ndarray:
"""Predicts the model values for the given actions."""
pass # your new implementation
# overwrite new methods
model.fit = fit
model.predict = predict
# save it again
with open(model_path, 'wb') as file:
pickle.dump(file)
...

How do I properly restore a Tensorflow Checkpoint?

I've extended the python implementation of WGAN-GP from here: https://keras.io/examples/generative/wgan_gp/
Basically, I added a callback to the fit function:
class GANCheckpoint(keras.callbacks.Callback):
def __init__(self, cpkt=None, manager=None):
self.cpkt = cpkt
self.manager = manager
def on_epoch_begin(self, epoch, logs=None):
if self.manager.latest_checkpoint:
self.cpkt.restore(self.manager.latest_checkpoint)
print("Restored from {}".format(self.manager.latest_checkpoint))
else:
print("Initializing from scratch.")
def on_epoch_end(self, epoch, logs=None):
save_path = manager.save()
self.cpkt.step.assign_add(1)
print("\nSaved checkpoint for step {}: {}".format(int(checkpoint.step), save_path))
And the checkpoint manager is initialized as:
Checkpoint manager
checkpoint_dir = './training_checkpoints/GAN/'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(step=tf.Variable(1),
d_model=d_model, g_model=g_model,
discriminator_optimizer=discriminator_optimizer, generator_optimizer=generator_optimizer)
manager = tf.train.CheckpointManager(checkpoint, checkpoint_dir, max_to_keep=None)
cbk = GANCheckpoint(cpkt=checkpoint, manager=manager)
Finally I have the fit call:
wgan.fit(X, batch_size=BATCH_SIZE, epochs=epochs, verbose = True, callbacks=[cbk])
I'm using checkpoint.restore(manager.latest_checkpoint) to restore weights in another python file.
However, my generator results are way off compared to what it is supposed to be.
I'm using the following code:
for i in range(10):
a = tf.random.normal(shape=(1, 128))
sample = checkpoint.g_model.predict(a)
print(sample)
I checked the weights of the generator and optimizer, they're coherent and seem identical.
Are checkpoints tied to a specific python file ?
Additionaly, even when I try to restore a checkpoint without fitting the model a first time, in the original python file, it does not work either.
Do you have any idea ?
Thanks in advance

Where in the code of pytorch or huggingface/transformer label gets "renamed" into labels?

My question concerns the example, available in the great huggingface/transformers library.
I am using a notebook, provided by library creators as a starting point for my pipeline. It presents a pipeline of finetuning a BERT for Sentence Classification on Glue dataset.
When getting into the code, I noticed a very weird thing, which I cannot explain.
In the example, input data is introduced to the model as the instances of the InputFeatures class from here:
This class has 4 attributes, including the label attribute:
class InputFeatures:
...
input_ids: List[int]
attention_mask: Optional[List[int]] = None
token_type_ids: Optional[List[int]] = None
label: Optional[Union[int, float]] = None
which are later passed as a dictionary of inputs to the forward() method of the model. This is done by the Trainer class, for example in the lines 573-576 here:
def _training_step(
self, model: nn.Module, inputs: Dict[str, torch.Tensor], optimizer: torch.optim.Optimizer
) -> float:
model.train()
for k, v in inputs.items():
inputs[k] = v.to(self.args.device)
outputs = model(**inputs)
However, the forward() method expects labels (note the plural form) input parameter (taken from here):
def forward(
self,
input_ids=None,
attention_mask=None,
head_mask=None,
inputs_embeds=None,
labels=None,
output_attentions=None,
):
So my question is where does the label become labels in this pipeline?
To give some extra info on the issue, I created my own pipeline, which uses nothing, related, with Glue data and pipe, basically it relies only on the Trainer class of transformers. I even use another model (Flaubert). I replicated the InputFeature class and my code works for both cases below:
class InputFeature:
def __init__(self, text, label):
self.input_ids = text
self.label = label
class InputFeaturePlural:
def __init__(self, text, label):
self.input_ids = text
self.labels = label
But it does not work if I name the second attribute as self.labe or by any other names. Why is it possible to use both attribute names?
It's not like it is extremely important in my case, but I feel uncomfortable passing around the data in the variable, which "changes name" somewhere along the way.

The rename happens in the collator. In the trainer init, when data_collator is None, a default one is used:
class Trainer:
# ...
def __init__(...):
# ...
self.data_collator = data_collator if data_collator is not None else default_data_collator
# ...
FYI, the self.data_collator is later used when you get the dataloader:
data_loader = DataLoader(
self.train_dataset,
batch_size=self.args.train_batch_size,
sampler=train_sampler,
collate_fn=self.data_collator, # <-- here
drop_last=self.args.dataloader_drop_last,
)
The default collator has a special handling for labels, which does this renaming, if needed:
# Special handling for labels.
# Ensure that tensor is created with the correct type
# (it should be automatically the case, but let's make sure of it.)
if hasattr(first, "label") and first.label is not None:
if type(first.label) is int:
labels = torch.tensor([f.label for f in features], dtype=torch.long)
else:
labels = torch.tensor([f.label for f in features], dtype=torch.float)
batch = {"labels": labels} # <-- here is where it happens
elif hasattr(first, "label_ids") and first.label_ids is not None:
if type(first.label_ids[0]) is int:
labels = torch.tensor([f.label_ids for f in features], dtype=torch.long)
else:
labels = torch.tensor([f.label_ids for f in features], dtype=torch.float)
batch = {"labels": labels}
else:
batch = {}

Tensorflow save one of multiple sessions

Hi have a Python script where I instantiate two objects of a neural network class.
Each object defines its own session and provide methods for saving the graph.
import tensorflow as tf
import os, shutil
class TestNetwork:
def __init__(self, id):
self.id = id
tf.reset_default_graph()
self.s = tf.placeholder(tf.float32, [None, 2], name='s')
w_initializer, b_initializer = tf.random_normal_initializer(0., 1.0), tf.constant_initializer(0.1)
self.k = tf.layers.dense(self.s, 2, kernel_initializer=w_initializer,
bias_initializer=b_initializer, name= 'k')
'''Defines self.session and initialize the variables'''
session_conf = tf.ConfigProto(
allow_soft_placement = True,
log_device_placement = False)
self.session = tf.Session(config = session_conf)
self.session.run(tf.global_variables_initializer())
def save_model(self, output_dir):
'''Save the network graph and weights to disk'''
if os.path.exists(output_dir):
# if provided output_dir already exists, remove it
shutil.rmtree(output_dir)
builder = tf.saved_model.builder.SavedModelBuilder(output_dir)
builder.add_meta_graph_and_variables(
self.session,
[tf.saved_model.tag_constants.SERVING],
clear_devices=True)
# create a new directory output_dir and store the saved model in it
builder.save()
t1 = TestNetwork(1)
t2 = TestNetwork(2)
t1.save_model("t1_model")
t2.save_model("t2_model")
The error I get is
TypeError: Cannot interpret feed_dict key as Tensor: The name
'save/Const:0' refers to a Tensor which does not exist. The operation,
'save/Const', does not exist in the graph.
I read something saying that this error is due to tf.train.Saver.
Thus I added the following line at the end of the __init__ method:
self.saver = tf.train.Saver(tf.global_variables(), max_to_keep = 5)
However I still get the error.

tf.reset_default_graph will clear the default graph stack and resets the global default graph.
NOTE: The default graph is a property of the current thread. This
function applies only to the current thread. Calling this function
while a tf.Session or tf.InteractiveSession is active will result in
undefined behavior. Using any previously created tf.Operation or
tf.Tensor objects after calling this function will result in undefined
behavior.
You should specify Graph separately, and define all of these in the corresponding graph scope.
def __init__(self, id):
self.id = id
self.graph = tf.Graph()
with self.graph.as_default():
self.s = tf.placeholder(tf.float32, [None, 2], name='s')
w_initializer, b_initializer = tf.random_normal_initializer(0., 1.0), tf.constant_initializer(0.1)
self.k = tf.layers.dense(self.s, 2, kernel_initializer=w_initializer,
bias_initializer=b_initializer, name= 'k')
init = tf.global_variables_initializer()
'''Defines self.session and initialize the variables'''
session_conf = tf.ConfigProto(
allow_soft_placement = True,
log_device_placement = False)
self.session = tf.Session(config = session_conf,graph=self.graph)
self.session.run(init)
tf.train.Saver is another way to save model variables.
Edit
If you get empty "variable", you should save model in graph:
def save_model(self, output_dir):
'''Save the network graph and weights to disk'''
if os.path.exists(output_dir):
# if provided output_dir already exists, remove it
shutil.rmtree(output_dir)
with self.graph.as_default():
builder = tf.saved_model.builder.SavedModelBuilder(output_dir)
builder.add_meta_graph_and_variables(
self.session,
[tf.saved_model.tag_constants.SERVING],
clear_devices=True)
# create a new directory output_dir and store the saved model in it
builder.save()

NotFoundError :Tensor name "prediction/InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta" not found in checkpoint files

Trying to run the Inceptionv2 Tensorflow model with the architecture and the checkpoint inception_resnet_v2_2016_08_30.ckpt. And my code is for predicting the probability of each classification, for a given image.
I try to construt the tensorflow code using class according to the awesome blog here. But we had error:
NotFoundError (see above for traceback): Tensor name "prediction/InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta"not found in checkpoint files inception_resnet_v2_2016_08_30.ckpt.
My error code as follows.
from inception_resnet_v2 import *
import functools
import inception_preprocessing
import matplotlib.pyplot as plt
import os
import numpy as np
import tensorflow as tf
from scipy.misc import imread
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def doublewrap(function):
"""
A decorator decorator, allowing to use the decorator to be used without
parentheses if no arguments are provided. All arguments must be optional.
"""
#functools.wraps(function)
def decorator(*args, **kwargs):
if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
return function(args[0])
else:
return lambda wrapee: function(wrapee, args, *kwargs)
return decorator
#doublewrap
def define_scope(function, scope=None, args, *kwargs):
"""
A decorator for functions that define TensorFlow operations. The wrapped
function will only be executed once. Subsequent calls to it will directly
return the result so that operations are added to the graph only once.
The operations added by the function live within a tf.variable_scope(). If
this decorator is used with arguments, they will be forwarded to the
variable scope. The scope name defaults to the name of the wrapped
function.
"""
attribute = '_cache_' + function.__name__
name = scope or function.__name__
#property
#functools.wraps(function)
def decorator(self):
if not hasattr(self, attribute):
with tf.variable_scope(name, args, *kwargs):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return decorator
class Inception(object):
def __init__(self,
image):
self.image = image
self.process_data # call function process_data
self.prediction
#define_scope
def process_data(self):
image_size = inception_resnet_v2.default_image_size
image = inception_preprocessing.preprocess_image(self.image, image_size, image_size, is_training=False, )
image1 = tf.expand_dims(image, 0)
return image1
#define_scope
def prediction(self):
'''Creates the Inception Resnet V2 model.'''
arg_scope = inception_resnet_v2_arg_scope()
with tf.contrib.slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(self.process_data, is_training=False)
probabilities = tf.nn.softmax(logits)
return probabilities
def main():
tf.reset_default_graph()
image = tf.placeholder(tf.float32, [None, None, 3])
model = Inception(image)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess,
'inception_resnet_v2_2016_08_30.ckpt')
probabilities = sess.run(model.prediction, feed_dict={image: data})
print(probabilities)
if _name_ == '__main__':
data = imread('ILSVRC2012_test_00000003 .JPEG', mode='RGB').astype(np.float)
main()
However, if we don't construct the code using class as above, and we just run sucessfully.
The following is the code which ran without errors.
from inception_resnet_v2 import *
import inception_preprocessing
import os
import numpy as np
import tensorflow as tf
from scipy.misc import imread
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
slim = tf.contrib.slim
tf.reset_default_graph()
# prepare data
data = imread('ILSVRC2012_test_00000003.JPEG', mode='RGB').astype(np.float)
image = tf.placeholder(tf.float32, [None, None, 3])
# pre-processing image
image_size = inception_resnet_v2.default_image_size
processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False,)
processed_image = tf.expand_dims(processed_image, 0)
# Creates the Inception Resnet V2 model.
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(processed_image, is_training=False)
probabilities = tf.nn.softmax(logits)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, './inception_resnet_v2_2016_08_30.ckpt')
print(sess.run(probabilities, feed_dict={image:data}))
Any help would be appreciated!

The decorator wraps the Inception network into a variable scope named after the function, prediction in this case. As a result, the variable names in the checkpoint don't match up with variable names in the graph anymore.
To verify this, you can change tf.variable_scope() to tf.name_scope() in the decorator. In most use cases, this should also not influence the rest of your program.
If you need the variable scope, you can pass a dict into tf.train.Saver() that maps variable names in the checkpoint to variable objects in the graph.
It's also possible to automate this by reading the variable names in the checkpoint using tf.python.pywrap_tensorflow. NewCheckpointReader() but I don't have a code example ready to share for this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a way to pickle a custom tensorflow.keras metric? - python

Related

Change the imported libraries of a saved pickle

How do I properly restore a Tensorflow Checkpoint?

Where in the code of pytorch or huggingface/transformer label gets "renamed" into labels?

Tensorflow save one of multiple sessions

NotFoundError :Tensor name "prediction/InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta" not found in checkpoint files

Categories

Resources