Using Pytorch, I am trying to implement a network that is using the pre=trained DeepLab ResNet-101.
I found two possible methods for using this network:
this one
or
torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)
However, I might not only need this network's output, but also several inside layers' outputs.
Is there a way to access the inner layer outputs using one of these methods?
If not - Is it possible to manually copy the trained resnet's parameters so I can manually recreate it and add those outputs myself? (Hopefully the first option is possible so I won't need to do this)
Thanks!
You can achieve this without too much trouble using forward hooks.
The idea is to loop over the modules of your model, find the layers you're interested in, hook a callback function onto them. When called, those layers will trigger the hook. We will take advantage of this to save the intermediate outputs.
For example, let's say you want to get the outputs of layer classifier.0.convs.3.1:
layers = ['classifier.0.convs.3.1']
activations = {}
def forward_hook(name):
def hook(module, x, y):
activations[name] = y
return hook
for name, module in model.named_modules():
if name in layers:
module.register_forward_hook(forward_hook(name))
*The closure around hook() made by forward_hook's scope is used to enclose the module's name which you wouldn't otherwise have access to at this point.
Everything is ready, we can call the model
>>> model = torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=True, progress=True, num_classes=21, aux_loss=None)
>>> model(torch.rand(16, 3, 100, 100))
And as expected, after inference, activations will have a new entry 'classifier.0.convs.3.1' which - in this case - will contain a tensor of shape (16, 256, 13, 13).
Not so long ago, I wrote an answer about a similar question which goes a little bit more in detail on how hooks can be used to inspect the intermediate output shapes.
Related
I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.
AKA Keras Model subclassing magic.
While playing with Keras, I noticed, that ResNetBlock.layers gets populated as I put new instances of layers into collections I previously put into my custom model.
class ResNetBlock(Model):
PART_COUNT = 3
def __init__(self, kernel_size, filters):
super().__init__()
self.convs = []
self.batchNorms = []
for part in range(ResNetBlock.PART_COUNT):
if part == 1:
conv = Conv2D(filters[part], kernel_size=kernel_size, padding="same")
else:
conv = Conv2D(filters[part], kernel_size=(1,1))
self.convs.append(conv)
self.batchNorms.append(BatchNormalization())
resnet = ResNetBlock(1, [1, 2, 3])
print(resnet.layers) # actually prints non-empty list
# filled with Conv2Ds and BNs from above
Adopted from official tutorial: https://www.tensorflow.org/beta/tutorials/eager/custom_layers
A bit of digging into TensorFlow source showed, that some kind of tracking is used via __setattr__ in Network class.
Now the code is not trivial, documentation lacking, and it seems unclear if the order of creating new layers/adding them to respective collections matters at all? E.g. if I first fill in convs collection, and only then batchNorms collection, would it still be the same model?
In most tutorials each layer is actually put into its own attribute.
Bonus question is: why is it done so implicitly? This kind of magic kinda breaks the motto to prefer explicit over implicit. What if for some reason I'd need to use a custom collection type not derived from list? How would I ensure these magic operations are done properly?
The order won't matter. What really changes your model is the call method. This stores the order of the operations (even if the order of the weights were variable, they would be applied in the same graph with the same functions)
Now, if you suspect that not using a "property", but using another kind of storage for the layers, would not register the layer for some reason, you can double check with:
print(len(resnet.trainable_weights))
The count should be 6 * PART_COUNT:
2 tensors for the conv layers (kernel and bias)
4 tensors for the BatchNormalization layers (mean, variance, scale and offset)
I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.
How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:
def getmodel():
model = Sequential()
mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape)
model.add(preproc_layer)
# Build the remaining model, perhaps set weights,
...
return model
Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:
m = getmodel()
mean, std = get_mean_std(..)
graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")
K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)
However the set_value fails with
AttributeError: 'Tensor' object has no attribute 'assign'
So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.
Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:
...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
Update 1:
I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:
# Regular model, trained as usual
model = ...
# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)
# Prepend the preprocessing model to the regular model
full_model = Model(inputs=[preproc_model.input],
outputs=[model(preproc_model.output)])
# Save the complete model to disk
full_model.save('full_model.hdf5')
This seems to work until the save() call, which fails with the same deep stack trace as above.
Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.
So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?
Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.
Update 2:
So I found that the deep stack trace could be avoided by doing
def foo(x):
bar = K.variable(baz, name="baz")
return x - bar
So defining bar inside the function instead of capturing from the outside scope.
I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.
Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model.
This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:
Build and train a model
Save it to disk
Load it, prepend a preprocessing model
Export the stacked model to disk as a frozen pb file
Load the frozen pb from disk
Apply it on some unseen data
I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:
Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
Train the model, build an identical model but this time with the proper values for mean/std.
Transfer the weights
Export and freeze the model to pb
All this now fully works as expected. Small overhead on training but negligible for me.
Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.
Will accept #Daniel's answer as it got me going in the right direction.
Related question:
Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
When creating a variable, you must give it the "value", not the shape:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.
For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:
def myFunc(x):
#reshape x in a way it's compatible with the tensors mean and std:
x = K.reshape(x,(-1,1,1,3))
#-1 is like a wildcard, it will be the value that matches the rest of the given shape.
#I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor
result = (x - mean_tensor) / (std_tensor + K.epsilon())
#now shape it back to the same shape it was before (which I don't know)
return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
#-1 is still necessary, it's the batch size
Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)
model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))
After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))
If you want to include everything, including the preprocessing inside the function, ok too:
def myFunc(x):
mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)
std_tensor = K.std(x,axis=[0,1,2])
x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.
result = (x - mean_tensor) / (std_tensor + K.epsilon())
return K.reshape(result,(-1,width,height,3))
Now, all this is extra calculation in your model and will consume processing.
It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).
I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above