How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:
def getmodel():
model = Sequential()
mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape)
model.add(preproc_layer)
# Build the remaining model, perhaps set weights,
...
return model
Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:
m = getmodel()
mean, std = get_mean_std(..)
graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")
K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)
However the set_value fails with
AttributeError: 'Tensor' object has no attribute 'assign'
So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.
Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:
...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
Update 1:
I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:
# Regular model, trained as usual
model = ...
# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)
# Prepend the preprocessing model to the regular model
full_model = Model(inputs=[preproc_model.input],
outputs=[model(preproc_model.output)])
# Save the complete model to disk
full_model.save('full_model.hdf5')
This seems to work until the save() call, which fails with the same deep stack trace as above.
Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.
So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?
Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.
Update 2:
So I found that the deep stack trace could be avoided by doing
def foo(x):
bar = K.variable(baz, name="baz")
return x - bar
So defining bar inside the function instead of capturing from the outside scope.
I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.
Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model.
This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:
Build and train a model
Save it to disk
Load it, prepend a preprocessing model
Export the stacked model to disk as a frozen pb file
Load the frozen pb from disk
Apply it on some unseen data
I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:
Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
Train the model, build an identical model but this time with the proper values for mean/std.
Transfer the weights
Export and freeze the model to pb
All this now fully works as expected. Small overhead on training but negligible for me.
Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.
Will accept #Daniel's answer as it got me going in the right direction.
Related question:
Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
When creating a variable, you must give it the "value", not the shape:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.
For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:
def myFunc(x):
#reshape x in a way it's compatible with the tensors mean and std:
x = K.reshape(x,(-1,1,1,3))
#-1 is like a wildcard, it will be the value that matches the rest of the given shape.
#I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor
result = (x - mean_tensor) / (std_tensor + K.epsilon())
#now shape it back to the same shape it was before (which I don't know)
return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
#-1 is still necessary, it's the batch size
Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)
model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))
After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))
If you want to include everything, including the preprocessing inside the function, ok too:
def myFunc(x):
mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)
std_tensor = K.std(x,axis=[0,1,2])
x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.
result = (x - mean_tensor) / (std_tensor + K.epsilon())
return K.reshape(result,(-1,width,height,3))
Now, all this is extra calculation in your model and will consume processing.
It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).
Related
TLDR - what is considered best practice when extracting feature maps from ResNet?
I'm trying to feed the entire CIFAR10 dataset through ResNet18, to extract a new dataset that consists of some non-output activation of every sample in CIFAR10. I have implemented a code that generates this dataset, but the running time takes too long (exceeds Google Colab free RAM access, which is quite some RAM). The code I've implemented is based on a blog post called Intermediate Activations — the forward hook.
activation = {}
def get_activation(name):
"""
when given as input to register_forward_hook, this function is implicitly called when model.forward() is performed
and saves the output of layer 'name' in the dictionary described above.
:param name:
:return:
"""
def hook(model, input, output):
activation[name] = output.detach()
return hook
the get_activation helper function is used inside the activation_maps function which takes the feature map provided from the 4th layer, 2nd BasicBlock, conv1 layer (batch-size, 3,224,224) -> (batch-size,512,7,7) of ResNet18
(PS - this layer was arbitrarily chosen - is there a known layer from which the activations are better?)
ResNet18 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
def activation_maps(name='conv1'):
"""
This function takes a batch and returns some non - last activation alongside the true labels
:return: train_activations_and_true_labels: array of tuples (Activation,True_labels) as train data
"""
non_output_activation_map = ResNet18.layer4[1].register_forward_hook(get_activation(name))
# now we create a list of activations and true labels for every sample.
# This means that if we looped over (X,y) in a dataloader, we can now loop (activation,y) which is
# an element in the arrays below, like a regular dataloader.
train_activations_and_true_labels = []
for i, (X_train, y_train) in enumerate(train_dataloader):
out = ResNet18(X_train)
train_activations_and_true_labels.append((activation[name], y_train))
print(f"Training data [{i}/{len(train_dataloader)}]", end='\r')
non_output_activation_map.remove() # detaching hooks
return train_activations_and_true_labels
Now, this code runs - but exceeds the memory capacity of my PyCharm/Google-Colab Am I missing something? what is the best approach when extracting feature maps?
What batch size are you using, and how much RAM do you have available? Resnet is a somewhat large model, and the layer you're extracting is quite large as well so storing all that in memory might be causing issues.
Try reducing your batch size, or storing intermediary results to disk and clearing them from memory.
You might also consider turning off the gradient computation when calling the ResNet18 model, this would save a good bit of memory. Putting the #torch.no_grad() decorator on activation_maps(name='conv1') might work.
I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.
Using Pytorch, I am trying to implement a network that is using the pre=trained DeepLab ResNet-101.
I found two possible methods for using this network:
this one
or
torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)
However, I might not only need this network's output, but also several inside layers' outputs.
Is there a way to access the inner layer outputs using one of these methods?
If not - Is it possible to manually copy the trained resnet's parameters so I can manually recreate it and add those outputs myself? (Hopefully the first option is possible so I won't need to do this)
Thanks!
You can achieve this without too much trouble using forward hooks.
The idea is to loop over the modules of your model, find the layers you're interested in, hook a callback function onto them. When called, those layers will trigger the hook. We will take advantage of this to save the intermediate outputs.
For example, let's say you want to get the outputs of layer classifier.0.convs.3.1:
layers = ['classifier.0.convs.3.1']
activations = {}
def forward_hook(name):
def hook(module, x, y):
activations[name] = y
return hook
for name, module in model.named_modules():
if name in layers:
module.register_forward_hook(forward_hook(name))
*The closure around hook() made by forward_hook's scope is used to enclose the module's name which you wouldn't otherwise have access to at this point.
Everything is ready, we can call the model
>>> model = torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=True, progress=True, num_classes=21, aux_loss=None)
>>> model(torch.rand(16, 3, 100, 100))
And as expected, after inference, activations will have a new entry 'classifier.0.convs.3.1' which - in this case - will contain a tensor of shape (16, 256, 13, 13).
Not so long ago, I wrote an answer about a similar question which goes a little bit more in detail on how hooks can be used to inspect the intermediate output shapes.
AKA Keras Model subclassing magic.
While playing with Keras, I noticed, that ResNetBlock.layers gets populated as I put new instances of layers into collections I previously put into my custom model.
class ResNetBlock(Model):
PART_COUNT = 3
def __init__(self, kernel_size, filters):
super().__init__()
self.convs = []
self.batchNorms = []
for part in range(ResNetBlock.PART_COUNT):
if part == 1:
conv = Conv2D(filters[part], kernel_size=kernel_size, padding="same")
else:
conv = Conv2D(filters[part], kernel_size=(1,1))
self.convs.append(conv)
self.batchNorms.append(BatchNormalization())
resnet = ResNetBlock(1, [1, 2, 3])
print(resnet.layers) # actually prints non-empty list
# filled with Conv2Ds and BNs from above
Adopted from official tutorial: https://www.tensorflow.org/beta/tutorials/eager/custom_layers
A bit of digging into TensorFlow source showed, that some kind of tracking is used via __setattr__ in Network class.
Now the code is not trivial, documentation lacking, and it seems unclear if the order of creating new layers/adding them to respective collections matters at all? E.g. if I first fill in convs collection, and only then batchNorms collection, would it still be the same model?
In most tutorials each layer is actually put into its own attribute.
Bonus question is: why is it done so implicitly? This kind of magic kinda breaks the motto to prefer explicit over implicit. What if for some reason I'd need to use a custom collection type not derived from list? How would I ensure these magic operations are done properly?
The order won't matter. What really changes your model is the call method. This stores the order of the operations (even if the order of the weights were variable, they would be applied in the same graph with the same functions)
Now, if you suspect that not using a "property", but using another kind of storage for the layers, would not register the layer for some reason, you can double check with:
print(len(resnet.trainable_weights))
The count should be 6 * PART_COUNT:
2 tensors for the conv layers (kernel and bias)
4 tensors for the BatchNormalization layers (mean, variance, scale and offset)
I load features and labels from my training dataset. Both of them are originally numpy arrays, but I change them to the torch tensor using torch.from _numpy(features.copy()) and torch.tensor(labels.astype(np.bool)).
And I notice that torch.autograd.Variable is something like placeholder in tensorflow.
When I train my network, first I tried
features = features.cuda()
labels = labels.cuda()
outputs = Config.MODEL(features)
loss = Config.LOSS(outputs, labels)
Then I tried
features = features.cuda()
labels = labels.cuda()
input_var = Variable(features)
target_var = Variable(labels)
outputs = Config.MODEL(input_var)
loss = Config.LOSS(outputs, target_var)
Both blocks succeed in activating training, but I worried that there might be trivial difference.
According to this question you no longer need variables to use Pytorch Autograd.
Thanks to #skytree, we can make this even more explizit: Variables have been deprecated, i.e. you're not supposed to use them anymore.
Autograd automatically supports Tensors with requires_grad set to True.
And more importantly
Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.
This means that if your features and labels are tensors already (which they seem to be in your example) your Variable(features) and Variable(labels) does only return a tensor again.
The original purpose of Variables was to be able to use automatic differentiation (Source):
Variables are just wrappers for the tensors so you can now easily auto compute the gradients.