I was learning making custom layers in tensor flow but could not find out how to add trainable weights for example
class Linear(layers.Layer):
def __init__(self, units = 32, **kwargs):
super().__init__(kwargs)
self.units = units
def build(self, input_shape):
self.layer = layers.Dense(self.units, trainable= True)
super().build(input_shape)
def call(self, inputs):
return self.layer(inputs)
Now if I do
linear_layer = Linear(8)
x = tf.ones(shape =(4,3))
y = linear_layer(x)
print(linear_layer.trainable_variables)
I get an empty matrix and thus during gradient calculation I get no gradients, my question is how to create custom layers in a way that default keras layers are also trainable in that. One more thing if I do linear_layer.weights then it give me the weights, it means there is some problem with trainable weights.
My mind is stuck on that
To get trainable variables you have to access the "layer" attribute of your custom layer:
linear_layer = Linear(8)
x = tf.ones(shape =(4,3))
y = linear_layer(x)
print(linear_layer.layer.trainable_variables)
note that you just create a pre_built layer (Dense) in the build method instead of create the weights of your custom layer. look at link https://www.tensorflow.org/tutorials/customization/custom_layers
Related
I want to get the weights of my custom layer, but I couldn't get them by model.layer().get_weights()[X].
So I checked the layers of the model, it seems that the custom layer is decomposed into several operations and no weights can be found in these layers.
Here is the custom layer code
class PixelBaseConv(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(PixelBaseConv, self).__init__(**kwargs)
def build(self, input_shape):
# kernel_shape: w*h*c*output_dim
kernel_size = input_shape[1:]
kernel_shape = (1,) + kernel_size + (self.output_dim, )
self.kernel = self.add_weight(name='kernel',
shape=kernel_shape,
initializer='uniform',
trainable=True)
super(PixelBaseConv, self).build(input_shape)
def call(self, inputs):
# output_shape: w*h*output_dim
outputs = []
inputs = K.cast(inputs, dtype="float32")
for i in range(self.output_dim):
#output = tf.keras.layers.Multiply()([inputs, self.kernel[..., i]])
output = inputs*self.kernel[...,i]
output = K.sum(output, axis=-1)
if len(outputs) != 0:
outputs = np.dstack([outputs, output])
else:
outputs = output[..., np.newaxis]
return tf.convert_to_tensor(outputs)
def compute_output_shape(self, input_shape):
return input_shape + (self.output_dim, )
Here is part of the model structure
enter image description here
I tried different ways to obtain the weights but due to the strange layers, failed.
Expected: the first five layers are replaced with single layer which has a trainable kernel. Weights can be get directly by get_weights()
I listed weight list length of the first 10 layers and printed weight of layer 1 by following codes
for i in range(len(model.layers)):
print("layer " + str(i), len(model.layers[i].get_weights()))
print(model.layers[1].get_weights()[0])
and got the result and error
enter image description here
enter image description here
I found why this problem occurred.
I wrote the custom layer by
import tensorflow.python.keras
while using other keras layers and creating the model by
import tensorflow.keras
I think these two libraries may not be compatible, so my custom layer was splitted into several operation layers. Thus, weights cannot be obtained and updated.
I changed all imports to tensorflow.keras, now everything goes well.
I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:
class WeightedAverage(tf.keras.layers.Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
init_value = tf.keras.initializers.Constant(value=0.5)
self.w = self.add_weight(name="weight",
initializer=init_value,
trainable=True)
def call(self, inputs):
return tf.keras.layers.average([inputs[0] * self.w,
inputs[1] * (1 - self.w)])
Now the problem is that after training the model, saving, and loading it again, the value for w remains 0.5. Is it possible that the parameter does not receive any gradient updates? When printing the trainable variables of my model, the parameter is listed and should therefore be included when calling model.fit.
Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario
here the custom layer:
class WeightedAverage(Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
def build(self, input_shape):
self.W = self.add_weight(
shape=(1,1,len(input_shape)),
initializer='uniform',
dtype=tf.float32,
trainable=True)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
here the full example in a regression problem:
inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)
m = Model([inp1,inp2], out)
m.compile('adam','mse')
n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))
m.fit([X1,X2], y, epochs=10)
in the end, you can also visualize the value of the weights in this way:
tf.nn.softmax(m.get_weights()[-3]).numpy()
I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:
class WeightedAverage(tf.keras.layers.Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
init_value = tf.keras.initializers.Constant(value=0.5)
self.w = self.add_weight(name="weight",
initializer=init_value,
trainable=True)
def call(self, inputs):
return tf.keras.layers.average([inputs[0] * self.w,
inputs[1] * (1 - self.w)])
Now the problem is that after training the model, saving, and loading it again, the value for w remains 0.5. Is it possible that the parameter does not receive any gradient updates? When printing the trainable variables of my model, the parameter is listed and should therefore be included when calling model.fit.
Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario
here the custom layer:
class WeightedAverage(Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
def build(self, input_shape):
self.W = self.add_weight(
shape=(1,1,len(input_shape)),
initializer='uniform',
dtype=tf.float32,
trainable=True)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
here the full example in a regression problem:
inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)
m = Model([inp1,inp2], out)
m.compile('adam','mse')
n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))
m.fit([X1,X2], y, epochs=10)
in the end, you can also visualize the value of the weights in this way:
tf.nn.softmax(m.get_weights()[-3]).numpy()
I have a custom layer within a Dense sublayer. I want to be able to name the weights of this sublayer. However, using name="my_dense" on the sublayer initializer doesn't seem to do this; the weights simply get named after the outer custom layer.
To illustrate the problem, suppose I want a custom layer that simply stacks two dense layers. I'll print the names of the weights of this custom layer.
class DoubleDense(keras.layers.Layer):
def __init__(self, units, **kwargs):
self.dense1 = keras.layers.Dense(units, name="first_dense")
self.dense2 = keras.layers.Dense(units, name="second_dense")
super(DoubleDense, self).__init__(**kwargs)
def build(self, input_shape):
self.dense1.build(input_shape)
self.dense2.build(self.dense1.units)
def call(self, input):
hidden = self.dense1(input)
return self.dense2(hidden)
dd = DoubleDense(3)
# We need to evaluate the layer once to build the weights
trivial_input = tf.ones((1,10))
output = dd(trivial_input)
# Print the names of all variables in the DoubleDense layer
print([weight.name for weight in dd.weights])
The output is this:
['double_dense_1/kernel:0',
'double_dense_1/bias:0',
'double_dense_1/kernel:0',
'double_dense_1/bias:0']
...but I was expecting something more like this:
['double_dense_1/first_dense_1/kernel:0',
'double_dense_1/first_dense_1/bias:0',
'double_dense_1/second_dense_1/kernel:0',
'double_dense_1/second_dense_1/bias:0']
So, Keras has named these weights ambiguously; there is no way to tell whether a weight tensor belongs to dd.dense1 or dd.dense2 by its name alone. I realise I could select the layer first and then the weights (dd.dense1.weights), but I would prefer not to do this in my application.
Is there a way to name the weights of a sublayer of a custom layer?
If you want the name for the subclass layers you need to include name_scope and then call build for each layer.
Below is the modified code which will give names for each layer in the output.
class DoubleDense(keras.layers.Layer):
def __init__(self, units, **kwargs):
self.dense1 = keras.layers.Dense(units)
self.dense2 = keras.layers.Dense(units)
super(DoubleDense, self).__init__( **kwargs)
def build(self, input_shape):
with tf.name_scope("first_dense"):
self.dense1.build(input_shape)
with tf.name_scope("second_dense"):
self.dense2.build(self.dense1.units)
def call(self, input):
hidden = self.dense1(input)
return self.dense2(hidden)
dd = DoubleDense(3)
# We need to evaluate the layer once to build the weights
trivial_input = tf.ones((1,10))
output = dd(trivial_input)
# Print the names of all variables in the DoubleDense layer
print([weight.name for weight in dd.weights])
Output:
['double_dense/first_dense/kernel:0', 'double_dense/first_dense/bias:0', 'double_dense/second_dense/kernel:0', 'double_dense/second_dense/bias:0']
Hope this answers your question, Happy Learning!
I'm very new to Keras and I'm writing a custom layer which implements Gaussian function [exp(-(w*x-mean)^2/sigma^2) where W, mean, sigma are all randomly generated].
Below is code for the custom layer:
class Gaussian(Layer):
def __init__(self,**kwargs):
super(Gaussian, self).__init__(**kwargs)
def build(self, input_shape):
# Create trainable weights for this layer.
self.W_init = np.random.rand(1,input_shape[1])
self.W = K.variable(self.W_init, name="W")
# Create trainable means for this layer.
self.mean_init = np.random.rand(1,input_shape[1])
self.mean = K.variable(self.mean_init, name="mean")
# Create trainable sigmas for this layer.
self.sigma_init = np.random.rand(1,input_shape[1])
self.sigma = K.variable(self.sigma_init, name="sigma")
self.trainable_weights = [self.mean, self.sigma]
super(Gaussian, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
result = tf.multiply(x, self.W)
result = tf.subtract(x, self.mean)
result = tf.multiply(tf.square(result),-1)
result = tf.divide(result, tf.square(self.sigma))
return result
def compute_output_shape(self, input_shape):
return input_shape
After putting it as the first layer in a Keras mnist tutorial(just wanted to make sure it runs without producing errors, didn't care for accuracy) and training the model, it appeared that the loss stopped decreasing after around 4 epochs and only the numbers of "mean" and "sigma" changed after training while the numbers of "W" remains the same. However, this doesn't happen if I put it as the second layer.
I ran the Keras mnist tutorial again without the custom layer and found out that the weights of the first layer didn't change either.
Is not updating the weights of first layer(more specifically the very first parameter) a Keras thing or am I missing something? Can I force it to update?
Thank you!
You are not implementing your layer correctly, Keras is not aware of your weights, that means they are not being trained by gradient descent. Take a look at this example:
from keras import backend as K
from keras.engine.topology import Layer
import numpy as np
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
Here you have to use add_weight to obtain a trainable weight, not just use K.variable as you are currently doing. This way your weights will be registered with Keras and they will be trained properly. You should do this for all trainable parameters in your layer.