Extracting feature maps from ResNet - python

TLDR - what is considered best practice when extracting feature maps from ResNet?
I'm trying to feed the entire CIFAR10 dataset through ResNet18, to extract a new dataset that consists of some non-output activation of every sample in CIFAR10. I have implemented a code that generates this dataset, but the running time takes too long (exceeds Google Colab free RAM access, which is quite some RAM). The code I've implemented is based on a blog post called Intermediate Activations — the forward hook.
activation = {}
def get_activation(name):
"""
when given as input to register_forward_hook, this function is implicitly called when model.forward() is performed
and saves the output of layer 'name' in the dictionary described above.
:param name:
:return:
"""
def hook(model, input, output):
activation[name] = output.detach()
return hook
the get_activation helper function is used inside the activation_maps function which takes the feature map provided from the 4th layer, 2nd BasicBlock, conv1 layer (batch-size, 3,224,224) -> (batch-size,512,7,7) of ResNet18
(PS - this layer was arbitrarily chosen - is there a known layer from which the activations are better?)
ResNet18 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
def activation_maps(name='conv1'):
"""
This function takes a batch and returns some non - last activation alongside the true labels
:return: train_activations_and_true_labels: array of tuples (Activation,True_labels) as train data
"""
non_output_activation_map = ResNet18.layer4[1].register_forward_hook(get_activation(name))
# now we create a list of activations and true labels for every sample.
# This means that if we looped over (X,y) in a dataloader, we can now loop (activation,y) which is
# an element in the arrays below, like a regular dataloader.
train_activations_and_true_labels = []
for i, (X_train, y_train) in enumerate(train_dataloader):
out = ResNet18(X_train)
train_activations_and_true_labels.append((activation[name], y_train))
print(f"Training data [{i}/{len(train_dataloader)}]", end='\r')
non_output_activation_map.remove() # detaching hooks
return train_activations_and_true_labels
Now, this code runs - but exceeds the memory capacity of my PyCharm/Google-Colab Am I missing something? what is the best approach when extracting feature maps?

What batch size are you using, and how much RAM do you have available? Resnet is a somewhat large model, and the layer you're extracting is quite large as well so storing all that in memory might be causing issues.
Try reducing your batch size, or storing intermediary results to disk and clearing them from memory.
You might also consider turning off the gradient computation when calling the ResNet18 model, this would save a good bit of memory. Putting the #torch.no_grad() decorator on activation_maps(name='conv1') might work.

Related

Custom reduction of losses within each batch in Keras

I am using keras for tensorflow in Python. I have a custom loss function that returns a single number for each sample in a batch (so a vector with length = batch size). How can I also specify a custom reduction method to aggregate these sample losses into a single loss for the entire batch? Is it acceptable to include this reduction within the custom loss function and have this function return just a single scalar rather than a vector of losses?
It really depends on your application and goal. A very common approach is to perform a reduce_mean over the loss generated on batch size. Some also use reduce_sum, which of course makes the loss value to depend on the batch size. A general (and maybe unnecessarily complicated) approach could be to use a function to call your desired function, which reduces the batch loss to a single value. Let's call it reducer. In your loss function, in the last line, you can call it right before return:
class my_loss(keras.losses.Loss):
def __init__(self, inputs)
# a bunch of assignments
self.reducer = self._get_reducer_function(inputs) (or a normal mean function)
def call(self, y_true, y_pred):
y_batch = ....
return self.reducer(y_batch)
def get_config(self):
return {'input': 1}
Of course you don't need to write so complicated, but it should give you an idea of how to do it. Also, you can simply add sample_weights if you need.

Accessing training data during tensorflow graph execution

I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.

Can I access the inner layer outputs of DeepLab in pytorch?

Using Pytorch, I am trying to implement a network that is using the pre=trained DeepLab ResNet-101.
I found two possible methods for using this network:
this one
or
torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)
However, I might not only need this network's output, but also several inside layers' outputs.
Is there a way to access the inner layer outputs using one of these methods?
If not - Is it possible to manually copy the trained resnet's parameters so I can manually recreate it and add those outputs myself? (Hopefully the first option is possible so I won't need to do this)
Thanks!
You can achieve this without too much trouble using forward hooks.
The idea is to loop over the modules of your model, find the layers you're interested in, hook a callback function onto them. When called, those layers will trigger the hook. We will take advantage of this to save the intermediate outputs.
For example, let's say you want to get the outputs of layer classifier.0.convs.3.1:
layers = ['classifier.0.convs.3.1']
activations = {}
def forward_hook(name):
def hook(module, x, y):
activations[name] = y
return hook
for name, module in model.named_modules():
if name in layers:
module.register_forward_hook(forward_hook(name))
*The closure around hook() made by forward_hook's scope is used to enclose the module's name which you wouldn't otherwise have access to at this point.
Everything is ready, we can call the model
>>> model = torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=True, progress=True, num_classes=21, aux_loss=None)
>>> model(torch.rand(16, 3, 100, 100))
And as expected, after inference, activations will have a new entry 'classifier.0.convs.3.1' which - in this case - will contain a tensor of shape (16, 256, 13, 13).
Not so long ago, I wrote an answer about a similar question which goes a little bit more in detail on how hooks can be used to inspect the intermediate output shapes.

Adding a preprocessing layer to keras model and setting tensor values

How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:
def getmodel():
model = Sequential()
mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape)
model.add(preproc_layer)
# Build the remaining model, perhaps set weights,
...
return model
Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:
m = getmodel()
mean, std = get_mean_std(..)
graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")
K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)
However the set_value fails with
AttributeError: 'Tensor' object has no attribute 'assign'
So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.
Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:
...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
Update 1:
I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:
# Regular model, trained as usual
model = ...
# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)
# Prepend the preprocessing model to the regular model
full_model = Model(inputs=[preproc_model.input],
outputs=[model(preproc_model.output)])
# Save the complete model to disk
full_model.save('full_model.hdf5')
This seems to work until the save() call, which fails with the same deep stack trace as above.
Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.
So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?
Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.
Update 2:
So I found that the deep stack trace could be avoided by doing
def foo(x):
bar = K.variable(baz, name="baz")
return x - bar
So defining bar inside the function instead of capturing from the outside scope.
I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.
Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model.
This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:
Build and train a model
Save it to disk
Load it, prepend a preprocessing model
Export the stacked model to disk as a frozen pb file
Load the frozen pb from disk
Apply it on some unseen data
I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:
Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
Train the model, build an identical model but this time with the proper values for mean/std.
Transfer the weights
Export and freeze the model to pb
All this now fully works as expected. Small overhead on training but negligible for me.
Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.
Will accept #Daniel's answer as it got me going in the right direction.
Related question:
Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
When creating a variable, you must give it the "value", not the shape:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.
For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:
def myFunc(x):
#reshape x in a way it's compatible with the tensors mean and std:
x = K.reshape(x,(-1,1,1,3))
#-1 is like a wildcard, it will be the value that matches the rest of the given shape.
#I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor
result = (x - mean_tensor) / (std_tensor + K.epsilon())
#now shape it back to the same shape it was before (which I don't know)
return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
#-1 is still necessary, it's the batch size
Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)
model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))
After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))
If you want to include everything, including the preprocessing inside the function, ok too:
def myFunc(x):
mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)
std_tensor = K.std(x,axis=[0,1,2])
x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.
result = (x - mean_tensor) / (std_tensor + K.epsilon())
return K.reshape(result,(-1,width,height,3))
Now, all this is extra calculation in your model and will consume processing.
It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).

Tensorflow: What's the difference between the use of tf.mat_fn() or tf.nn.dynamic_rnn() to apply layers before an LSTM?

This question is about coding strategy using Tensorflow. I would like to create a small classifier network made of:
1: an input
2: a simple layer fully connected (W*x+B)
3: a LSTM layer
4: a softmax layer
5: an ouput
In tensorflow, to use the class tf.nn.dynamic_rnn(), we need to a batch of sequences to the network. So far, it's work perfectly (I love this library).
But as I want to apply a simple layer on each features of my sequences (2nd layer in my description), i'm wondering:
Do i preceed my LSTM layer with this simple layer and pass both to the tf.nn.dynamic_rnn() operation...
OR
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
Normally, it should give me the same result ? If it's the case, what should I use ?
Thank you for your time !
I recently encountered a similar scenario, where I'd like to chain recurrent and non-recurrent layers.
Do i preceed my LSTM layer with this simple layer and pass both to the
tf.nn.dynamic_rnn() operation...
This won't work. The function dynamic_rnn expects a cell as its first argument. A cell is a class that inherits from tf.nn.rnn_cell.RNNCell. Additionally, the second input argument to dynamic_rnn should be a tensor with at least 3 dimensions, where the first two dimensions are batch and time (time_major=False) or time and batch (time_major=True).
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
This might work, but doesn't appear to me to be an efficient and clean solution. Firstly, it should not be necessary to 'unpack batches', as you presumably want to perform some operation on batches of features and time-steps, where each observation in a batch is independent from the others.
My solution to this particular problem was to create a sub-class of tf.nn.rnn_cell.RNNCell. In my case I wanted a simple feedforward layer that would iterate over all of the time steps and that could be used in dynamic_rnn:
import tensorflow as tf
class FeedforwardCell(tf.nn.rnn_cell.RNNCell):
"""A stateless feedforward cell that can be used with MultiRNNCell
"""
def __init__(self, num_units, activation=tf.tanh, dtype=tf.float32):
self._num_units = num_units
self._activation = activation
# Store a dummy state to make dynamic_rnn happy.
self.dummy = tf.constant([[0.0]], dtype=dtype)
#property
def state_size(self):
return 1
#property
def output_size(self):
return self._num_units
def zero_state(self, batch_size, dtype):
return self.dummy
def __call__(self, inputs, state, scope=None):
"""Basic feedforward: output = activation(W * input)."""
with tf.variable_scope(scope or type(self).__name__): # "FeedforwardCell"
output = self._activation(tf.nn.rnn_cell._linear(
[inputs], self._num_units, True))
return output, self.dummy
An instance of this class can be passed, in a list with "normal" RNN cells, to an tf.nn.rnn_cell.MultiRNNCell initializer. The resulting object instance can be passed as the cell input argument to dynamic_rnn.
Important to note: dynamic_rnn expects that a recurrent cell returns a state when called. I therefore use dummy in FeedforwardCell as a fake state variable.
My solution might not be the smoothest or best way to chain recurrent and non-recurrent layers together. I'd be interested in hearing from other Tensorflow users about their suggestions.
Edit
If you choose to use the sequence_length input argument of dynamic_rnn, then state_size should be self._num_units and the dummy state should have shape [batch_size, self.state_size]. In other words, the state cannot be a scalar. Note that bidirectional_dynamic_rnn requires that the sequence_length argument is not None, whereas dynamic_rnn does not have this requirement. (This is weakly documented in the TF documentation.)

Categories

Resources