I would like to implement a custom tf.keras layer called MyLayer which has three inputs and contains a sub layer which in turn has three inputs, like in the figure below:
I assume that the right thing to do would be to create a MyLayer class that extends tf.keras.layers.Layer and implement the __init__, build and call methods, as mentioned in the official documentation.
Now, the examples provided in the documentation are relative to pretty simple layers that are composed of several sublayers connected in a sequential manner, that is one after the other. For instance, the MLPBlock layer consists of 3 linear layers ordered sequentially.
In general, however, sublayers are not ordered sequentially, but can form branches. This suggests that those layers could be run in parallel, since they are not connected to one another.
Going back to the custom layer I would like to implement, you can see that Layer1, Layer2 and Layer3 could be run in parallel. Once their outputs are computed, they can be fed to Layer4. The point is: how do I run them in parallel? I couldn't find any "ParallelCombinator" or things like that among the available Keras layers.
If I were to follow the examples provided in the documentation, I would write something along these lines:
class MyLayer(keras.layers.Layer):
def __init__(self, ...):
super(MyLayer, self).__init__()
self.layer_1 = Layer1(...)
self.layer_2 = Layer2(...)
self.layer_3 = Layer3(...)
self.layer_4 = Layer4(...)
def call(self, inputs):
tmp_1 = self.layer_1(inputs[0])
tmp_2 = self.layer_2(inputs[1])
tmp_3 = self.layer_3(inputs[2])
return self.layer4([tmp_1, tmp_2, tmp_3])
This, however, would imply that Layer1, Layer2 and Layer3 are run sequentially, not in parallel.
One possible solution that I came up with involves structuring MyLayer as a tf.keras.Model built with Keras's functional API rather than as a subclass of tf.keras.Layer, like so:
def MyLayer(...):
input_1 = tf.keras.layers.Input(...)
input_2 = tf.keras.layers.Input(...)
input_3 = tf.keras.layers.Input(...)
layer_1 = Layer1(...)(input_1)
layer_2 = Layer2(...)(input_2)
layer_3 = Layer3(...)(input_3)
output_1 = Layer4(...)([layer_1, layer_2, layer_3])
return tf.keras.Model(inputs=[input_1, input_2, input_3], outputs=output_1)
if __name__ == '__main__':
my_layer = MyLayer(...)
input_1 = ...
input_2 = ...
input_3 = ...
output = my_layer([input_1, input_2, input_3])
The reason why I think this would work is that I assume that when I feed some inputs to a tf.keras.Model, as in output = my_layer([input_1, input_2, input_3]), the layers that can be run in parallel are effectively run in parallel (or are they?). This solution, however, feels like a hack to me, as MyLayer is supposed to be a layer, not a model. In fact, a tf.keras.Model instance exposes methods like fit(...) that aren't meant to be called on a layer.
Does anybody know what's the best approach to implement MyLayer?
Related
This code is from PyTorch transformer:
self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
self.dropout = Dropout(dropout)
self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)
self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.norm3 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
self.dropout1 = Dropout(dropout)
self.dropout2 = Dropout(dropout)
self.dropout3 = Dropout(dropout)
Why do they add self.dropout1, ...2, ...3 when self.dropout already exists and is the exact same function?
Also, what is the difference between (self.linear1, self.linear2) and self.linear?
In the case of Dropout, reusing the layer should not usually be an issue. So you could create a single self.dropout = Dropout(dropout) layer and call it multiple times in the forward function. But there may be subtle use cases which would behave differently when you do this, such as if you iterate across layers in a network for some reason. This thread, and particularly this post, discuss this in some detail.
For the linear layer, each Linear object is characterized by a set of weights and biases. If you call it multiple times in the forward function, all the calls will share and optimize the same set of weights. This can have legitimate uses, but is not appropriate when you want multiple linear layers, each with its own set of weights and biases.
That's because to separate one Linear layer or Dropout layer from one another. That's very simple logic. You are creating different instances or layers in the network of the Dropout function using self.dropout = Dropout(dropout).
In keras / tensorflow it is often quite simple to describe layers directly as functions that map their input to an output, like so:
def resnet_block(x, kernel_size):
ch = x.shape[-1]
out = Conv2D(ch, kernel_size, strides = (1,1), padding='same', activation='relu')(x)
out = Conv2D(ch, kernel_size, strides = (1,1), padding='same', activation='relu')(out)
out = Add()([x,out])
return out
whereas subclassing Layer to get something like
r = ResNetBlock(kernel_size=(3,3))
y = r(x)
is a little more cumbersome (or even a lot more cumbersome for more complex examples).
Since keras seems perfectly happy to construct the underlying weights of its layers when they're being called for the first time, I was wondering if it was possible to just wrap functions such as the one above and let keras figure things out once there are inputs, i.e. I would like it to look like this:
r = FunctionWrapperLayer(lambda x:resnet_block(x, kernel_size=(3,3)))
y = r(x)
I've made an attempt at implementing FunctionWrapperLayer, which looks as follows:
class FunctionWrapperLayer(Layer):
def __init__(self, fn):
super(FunctionWrapperLayer, self).__init__()
self.fn = fn
def build(self, input_shape):
shape = input_shape[1:]
inputs = Input(shape)
outputs = self.fn(inputs)
self.model = Model(inputs=inputs, outputs=outputs)
self.model.compile()
def call(self, x):
return self.model(x)
This looks like it might work, however I've run into some bizarre issues whenever I use activations, e.g. with
def bad(x):
out = tf.keras.activations.sigmoid(x)
out = Conv2D(1, (1,1), strides=(1,1), padding='same')(out)
return out
x = tf.constant(tf.reshape(tf.range(48,dtype=tf.float32),[1,4,-1,1])
w = FunctionWrapperLayer(bad)
w(x)
I get the following error
FailedPreconditionError: Error while reading resource variable _AnonymousVar34 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar34/class tensorflow::Var does not exist.
[[node conv2d_6/BiasAdd/ReadVariableOp (defined at <ipython-input-33-fc380d9255c5>:12) ]] [Op:__inference_keras_scratch_graph_353]
What this suggests to me is that there is something inherently wrong with initializing models like that in the build method. Maybe someone has a better idea as to what might be going on there or how else to get the functionality I would like.
Update:
As mentioned by jr15, the above does work when the function involved only uses keras layers. However, the following ALSO works, which has me a little puzzled:
i = Input(x.shape[1:])
o = bad(i)
model = Model(inputs=i, outputs=o)
model(x)
Incidentally, model.submodules yields
(<tensorflow.python.keras.engine.input_layer.InputLayer at 0x219d80c77c0>,
<tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer at 0x219d7afc820>,
<tensorflow.python.keras.layers.convolutional.Conv2D at 0x219d7deafa0>)
meaning the activation is automatically turned into a "TensorFlowOpLayer" when doing it like that.
Another update:
Looking at the original error message, it seems like the activation isn't the only culprit. If I remove the convolution and use the wrapper everything works as well and again I find a "TensorFlowOpLayer" when inspecting the submodules.
You solution actually works! The trouble you're running into is that tf.keras.activations.sigmoid is not a Layer, but a plain Tensorflow function. To make it work, use keras.layers.Activation("sigmoid")(x) instead. For the more general case, where you want to use some Tensorflow function as a layer, you can wrap it in a Lambda layer like so:
out = keras.layers.Lambda(lambda x: tf.some_function(x))(out)
See the docs for more info: https://keras.io/api/layers/core_layers/lambda/
With Tensorflow 2.4 it apparently just works now. The submodules now show a "TFOpLambda" layer.
To anybody interested, here is some slightly improved wrapper code that also accommodates multi-input models:
class FunctionWrapperLayer(Layer):
def __init__(self, fn):
super(FunctionWrapperLayer, self).__init__()
self.fn = fn
def build(self, input_shapes):
super(FunctionWrapperLayer, self).build(input_shapes)
if type(input_shapes) is list:
inputs = [Input(shape[1:]) for shape in input_shapes]
else:
inputs = Input(input_shapes[1:])
outputs = self.fn(inputs)
self.fn_model = Model(inputs=inputs, outputs=outputs)
self.fn_model.compile()
def call(self, x):
return self.fn_model(x)
I'm currently trying to learn Sonnet.
My network (incomplete, the question is based on this):
class Model(snt.AbstractModule):
def __init__(self, name="LSTMNetwork"):
super(Model, self).__init__(name=name)
with self._enter_variable_scope():
self.l1 = snt.LSTM(100)
self.l2 = snt.LSTM(100)
self.out = snt.LSTM(10)
def _build(self, inputs):
# 'inputs' is of shape (batch_size, input_length)
# I need it to be of shape (batch_size, sequence_length, input_length)
l1_state = self.l1.initialize_state(np.shape(inputs)[0]) # init with batch_size
l2_state = self.l2.initialize_state(np.shape(inputs)[0]) # init with batch_size
out_state = self.out.initialize_state(np.shape(inputs)[0])
l1_out, l1_state = self.l1(inputs, l1_state)
l1_out = tf.tanh(l1_out)
l2_out, l2_state = self.l2(l1_out, l2_state)
l2_out = tf.tanh(l2_out)
output, out_state = self.out(l2_out, out_state)
output = tf.sigmoid(output)
return output, out_state
In other frameworks (eg. Keras), LSTM inputs are of the form (batch_size, sequence_length, input_length).
However, the Sonnet documentation states that the input to Sonnet's LSTM is of the form (batch_size, input_length).
How do I use them for sequential input?
So far, I've tried using a for loop inside _build, iterating over each timestep, but that gives seemingly random outputs.
I've tried the same architecture in Keras, which runs without any issues.
I'm executing in eager mode, using GradientTape for training.
We generally wrote the RNNs in Sonnet to work on a single timestep basis, as for Reinforcement Learning you often need to run one timestep to pick an action, and without that action you can't get the next observation (and the next input timestep) from the environment. It's easy to unroll a single timestep module over a sequence using tf.nn.dynamic_rnn (see below). We also have a wrapper which takes care of composing several RNN cores per timestep, which I believe is what you're looking to do. This has the advantage that the DeepCore object supports the start state methods required for dynamic_rnn, so it's API compatibe with LSTM or any other single-timestep module.
What you want to do should be achievable like this:
# Create a single-timestep RNN module by composing recurrent modules and
# non-recurrent ops.
model = snt.DeepRNN([
snt.LSTM(100),
tf.tanh,
snt.LSTM(100),
tf.tanh,
snt.LSTM(100),
tf.sigmoid
], skip_connections=False)
batch_size = 2
sequence_length = 3
input_size = 4
single_timestep_input = tf.random_uniform([batch_size, input_size])
sequence_input = tf.random_uniform([batch_size, sequence_length, input_size])
# Run the module on a single timestep
single_timestep_output, next_state = model(
single_timestep_input, model.initial_state(batch_size=batch_size))
# Unroll the module on a full sequence
sequence_output, final_state = tf.nn.dynamic_rnn(
core, sequence_input, dtype=tf.float32)
A few things to note - if you haven't already please have a look at the RNN example in the repository, as this shows a full graph mode training procedure setup around a fairly similar model.
Secondly, if you do end up needing to implement a more complex module that DeepRNN allows for, it's important to thread the recurrent state in and out of the module. In your example you're making the input state internally, and l1_state and l2_state as output are effectively discarded, so this can't be properly trained. If DeepRNN wasn't available, your model would look like this:
class LSTMNetwork(snt.RNNCore): # Note we inherit from the RNN-specific subclass
def __init__(self, name="LSTMNetwork"):
super(Model, self).__init__(name=name)
with self._enter_variable_scope():
self.l1 = snt.LSTM(100)
self.l2 = snt.LSTM(100)
self.out = snt.LSTM(10)
def initial_state(self, batch_size):
return (self.l1.initial_state(batch_size),
self.l2.initial_state(batch_size),
self.out.initial_state(batch_size))
def _build(self, inputs, prev_state):
# separate the components of prev_state
l1_prev_state, l2_prev_state, out_prev_state = prev_state
l1_out, l1_next_state = self.l1(inputs, l1_prev_state)
l1_out = tf.tanh(l1_out)
l2_out, l2_next_state = self.l2(l1_out, l2_prev_state)
l2_out = tf.tanh(l2_out)
output, out_next_state = self.out(l2_out, out_prev_state)
# Output state of LSTMNetwork contains the output states of inner modules.
full_output_state = (l1_next_state, l2_next_state, out_next_state)
return tf.sigmoid(output), full_output_state
Finally, if you're using eager mode I would strongly encourage you to have a look at Sonnet 2 - it's a complete rewrite for TF 2 / Eager mode. It's not backwards compatible, but all the same kinds of module compositions are possible. Sonnet 1 was written primarily for Graph mode TF, and while it does work with Eager mode you'll probably encounter some things that aren't very convenient.
We worked closely with the TensorFlow team to make sure that TF 2 & Sonnet 2 work nicely together, so please have a look: (https://github.com/deepmind/sonnet/tree/v2). Sonnet 2 should be considered alpha, and is being actively developed, so we don't have loads of examples yet, but more will be added in the near future.
I know that you can reuse Keras layers. For eg I declare two layers for a decoder network:
decoder_layer_1 = Dense(intermediate_dim,activation='relu',name='decoder_layer_1')
decoder_layer_2 = Dense(intermediate_dim,activation='relu',name='decoder_layer_2')
Use in first model:
decoded = decoder_layer_1(z)
decoded = decoder_layer_2(decoded)
Use in second model:
_decoded = decoder_layer_1(decoder_input)
_decoded = decoder_layer_2(_decoded)
The above method is ok if I need to reuse only a couple of layers, cumbersome if I want to reuse a large number of layers (for eg. a decoder network with 10 layers). Is there a more efficient means to do it other than explicitly declaring each layer. Is there a means to implement it as shown below:
decoder_layers = group_of_layers()
Reuse in the first model:
decoded = group_of_layers(z)
Reuse in the second model:
_decoded = group_of_layers(decoder_input)
I struggled with this problem too. What works for me is to wrap shared parts in a model, with its own input definition:
def group_of_layers(intermediate_dim):
shared_model_input = keras.layers.Input(shape=...)
shared_internal_layer = keras.layers.Dense(intermediate_dim, activation='relu', name='shared_internal_layer')(shared_model_input)
shared_model_output = keras.layers.Dense(intermediate_dim, activation='relu', name='shared_model_output')(shared_internal_layer)
return keras.models.Model(shared_model_input, shared_model_output)
In Functional API, you can use the shared model in the same way a single layer as long as the model's input layer matches shape of layers you apply to it:
group = group_of_layers(intermediate_dim)
result1 = group(previous_layer)
result2 = group(different_previous_layer)
The weights are going to be shared then.
This is nicely described in the documentation, see Shared vision model.
I have some cnn, and I want to fetch the value of some intermediate layer corresponding to a some key from the state dict.
How could this be done?
Thanks.
I think you need to create a new class that redefines the forward pass through a given model. However, most probably you will need to create the code regarding the architecture of your model. You can find here an example:
class extract_layers():
def __init__(self, model, target_layer):
self.model = model
self.target_layer = target_layer
def __call__(self, x):
return self.forward(x)
def forward(self, x):
module = self.model._modules[self.target_layer]
# get output of the desired layer
features = module(x)
# get output of the whole model
x = self.model(x)
return x, features
model = models.vgg19(pretrained=True)
target_layer = 'features'
extractor = extract_layers(model, target_layer)
image = Variable(torch.randn(1, 3, 244, 244))
x, features = extractor(image)
In this case, I am using the pre-defined vgg19 network given in the pytorch models zoo. The network has the layers structured in two modules the features for the convolutional part and the classifier for the fully-connected part. In this case, since features wraps all the convolutional layers of the network it is straightforward. If your architecture has several layers with different names, you will need to store their output using something similar to this:
for name, module in self.model._modules.items():
x = module(x) # forward the module individually
if name in self.target_layer:
features = x # store the output of the desired layer
Also, you should keep in mind that you need to reshape the output of the layer that connects the convolutional part to the fully-connected one. It should be easy to do if you know the name of that layer.