Related
I would like to extract and store the dropout mask [array of 1/0s] from a dropout layer in a Sequential Keras model at each batch while training. I was wondering if there was a straight forward way way to do this within Keras or if I would need to switch over to tensorflow (How to get the dropout mask in Tensorflow).
Would appreciate any help! I'm quite new to TensorFlow and Keras.
There are a couple of functions (dropout_layer.get_output_mask(), dropout_layer.get_input_mask()) for the dropout layer that I tried using but got None after calling on the previous layer.
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="flat", input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(
512,
activation='relu',
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
dropout = tf.keras.layers.Dropout(0.2, name = 'dropout') #want this layer's mask
model.add(dropout)
x = dropout.output_mask
y = dropout.input_mask
model.add(tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
model.compile(...)
model.fit(...)
It's not easily exposed in Keras. It goes deep until it calls the Tensorflow dropout.
So, although you're using Keras, it's will also be a tensor in the graph that can be gotten by name (finding it's name: In Tensorflow, get the names of all the Tensors in a graph).
This option, of course will lack some keras information, you should probably have to do that inside a Lambda layer so Keras adds certain information to the tensor. And you must take extra care because the tensor will exist even when not training (where the mask is skipped)
Now, you can also use a less hacky way, that may consume a little processing:
def getMask(x):
boolMask = tf.not_equal(x, 0)
floatMask = tf.cast(boolMask, tf.float32) #or tf.float64
return floatMask
Use a Lambda(getMasc)(output_of_dropout_layer)
But instead of using a Sequential model, you will need a functional API Model.
inputs = tf.keras.layers.Input((28, 28, 1))
outputs = tf.keras.layers.Flatten(name="flat")(inputs)
outputs = tf.keras.layers.Dense(
512,
# activation='relu', #relu will be a problem here
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
outputs = tf.keras.layers.Dropout(0.2, name = 'dropout')(outputs)
mask = Lambda(getMask)(outputs)
#there isn't "input_mask"
#add the missing relu:
outputs = tf.keras.layers.Activation('relu')(outputs)
outputs = tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
model = Model(inputs, outputs)
model.compile(...)
model.fit(...)
Training and predicting
Since you can't train the masks (it doesn't make any sense), it should not be an output of the model for training.
Now, we could try this:
trainingModel = Model(inputs, outputs)
predictingModel = Model(inputs, [output, mask])
But masks don't exist in prediction, because dropout is only applied in training. So this doesn't bring us anything good in the end.
The only way for training is then using a dummy loss and dummy targets:
def dummyLoss(y_true, y_pred):
return y_true #but this might evoke a "None" gradient problem since it's not trainable, there is no connection to any weights, etc.
model.compile(loss=[loss_for_main_output, dummyLoss], ....)
model.fit(x_train, [y_train, np.zeros((len(y_Train),) + mask_shape), ...)
It's not guaranteed that these will work.
I found a very hacky way to do this by trivially extending the provided dropout layer. (Almost all code from TF.)
class MyDR(tf.keras.layers.Layer):
def __init__(self,rate,**kwargs):
super(MyDR, self).__init__(**kwargs)
self.noise_shape = None
self.rate = rate
def _get_noise_shape(self,x, noise_shape=None):
# If noise_shape is none return immediately.
if noise_shape is None:
return array_ops.shape(x)
try:
# Best effort to figure out the intended shape.
# If not possible, let the op to handle it.
# In eager mode exception will show up.
noise_shape_ = tensor_shape.as_shape(noise_shape)
except (TypeError, ValueError):
return noise_shape
if x.shape.dims is not None and len(x.shape.dims) == len(noise_shape_.dims):
new_dims = []
for i, dim in enumerate(x.shape.dims):
if noise_shape_.dims[i].value is None and dim.value is not None:
new_dims.append(dim.value)
else:
new_dims.append(noise_shape_.dims[i].value)
return tensor_shape.TensorShape(new_dims)
return noise_shape
def build(self, input_shape):
self.noise_shape = input_shape
print(self.noise_shape)
super(MyDR,self).build(input_shape)
#tf.function
def call(self,input):
self.noise_shape = self._get_noise_shape(input)
random_tensor = tf.random.uniform(self.noise_shape, seed=1235, dtype=input.dtype)
keep_prob = 1 - self.rate
scale = 1 / keep_prob
# NOTE: if (1.0 + rate) - 1 is equal to rate, then we want to consider that
# float to be selected, hence we use a >= comparison.
self.keep_mask = random_tensor >= self.rate
#NOTE: here is where I save the binary masks.
#the file grows quite big!
tf.print(self.keep_mask,output_stream="file://temp/droput_mask.txt")
ret = input * scale * math_ops.cast(self.keep_mask, input.dtype)
return ret
Note: I posted about this issue already here. I'm creating a new question because:
1. I think the issue specifically relates to reshaping my mask within my custom layer, but I'm not sure enough of that to completely ignore the other error I wrote about in the original post.
2. There are many posts about reshaping Keras layers or adding Masking layers, but I couldn't find any about reshaping a mask within a layer, so I hope this post can be useful more generally.
The issue:
I have a custom Keras layer that takes 2D input and returns 3D output (batch_size, max_length, 1024), which is passed on to a BiLSTM followed by a CRF.
The custom Keras layer is copied from this repository. The difference is I take the 'elmo' instead of 'default' outputs from the Elmo model, so that the output is 3D as required by the BiLSTM:
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['elmo'] # The original code used 'default'
However the compute_mask function isn't appropriate for my architecture, as it's output is 2D. Thus I get the error:
InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
where 32 is batch size and 47 is one less than my specified max_length.
I'm sure I need to reshape the mask, but I couldn't find out anywhere how.
Happy to make a git repo with the whole thing and/or full stack trace if need be.
Custom ELMo Layer:
class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable = True
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable, name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True, signature='default',)['elmo']
return result
# Original compute_mask function. Raises;
# InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '__PAD__')
def compute_output_shape(self, input_shape):
return input_shape[0], 48, self.dimensions
The model is built as follows:
def build_model(): # uses crf from keras_contrib
input = layers.Input(shape=(1,), dtype=tf.string)
model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
crf = CRF(num_tags)
out = crf(model)
model = Model(input, out)
model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
model.summary()
return model
I'm trying to implement an unsupervised ANN using Hebbian updating in Keras. I found a custom Hebbian layer made by Dan Saunders here - https://github.com/djsaunde/rinns_python/blob/master/hebbian/hebbian.py
(I hope it is not poor form to ask questions about another person's code here)
In the examples I found using this layer in the repo, this layer is used as an intermediate layer between Dense/Conv layers, but I would like to construct a network using only Hebbian layers.
Two critical things are confusing me in this implementation:
It seems as though input dims and output dims must be the same for this layer to work. Why would this be the case and what can I do to make it so they can be different?
Why is the diagonal of the weight matrix set to zero? It says this is to "ensure that no neuron is laterally connected to itself", but I thought the connection weights were between the previous layer and the current layer, not the current layer and itself.
Here is the code for the Hebbian Layer Implementation:
from keras import backend as K
from keras.engine.topology import Layer
import numpy as np
import tensorflow as tf
np.set_printoptions(threshold=np.nan)
sess = tf.Session()
class Hebbian(Layer):
def __init__(self, output_dim, lmbda=1.0, eta=0.0005, connectivity='random', connectivity_prob=0.25, **kwargs):
'''
Constructor for the Hebbian learning layer.
args:
output_dim - The shape of the output / activations computed by the layer.
lambda - A floating-point valued parameter governing the strength of the Hebbian learning activation.
eta - A floating-point valued parameter governing the Hebbian learning rate.
connectivity - A string which determines the way in which the neurons in this layer are connected to
the neurons in the previous layer.
'''
self.output_dim = output_dim
self.lmbda = lmbda
self.eta = eta
self.connectivity = connectivity
self.connectivity_prob = connectivity_prob
if self.connectivity == 'random':
self.B = np.random.random(self.output_dim) < self.connectivity_prob
elif self.connectivity == 'zero':
self.B = np.zeros(self.output_dim)
super(Hebbian, self).__init__(**kwargs)
def random_conn_init(self, shape, dtype=None):
A = np.random.normal(0, 1, shape)
A[self.B] = 0
return tf.constant(A, dtype=tf.float32)
def zero_init(self, shape, dtype=None):
return np.zeros(shape)
def build(self, input_shape):
# create weight variable for this layer according to user-specified initialization
if self.connectivity == 'all':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer='uniform', trainable=False)
elif self.connectivity == 'random':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer=self.random_conn_init, trainable=False)
elif self.connectivity == 'zero':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer=self.zero_init, trainable=False)
else:
raise NotImplementedError
# ensure that no neuron is laterally connected to itself
self.kernel = self.kernel * tf.diag(tf.zeros(self.output_dim))
# call superclass "build" function
super(Hebbian, self).build(input_shape)
def call(self, x):
x_shape = tf.shape(x)
batch_size = tf.shape(x)[0]
# reshape to (batch_size, product of other dimensions) shape
x = tf.reshape(x, (tf.reduce_prod(x_shape[1:]), batch_size))
# compute activations using Hebbian-like update rule
activations = x + self.lmbda * tf.matmul(self.kernel, x)
# compute outer product of activations matrix with itself
outer_product = tf.matmul(tf.expand_dims(x, 1), tf.expand_dims(x, 0))
# update the weight matrix of this layer
self.kernel = self.kernel + tf.multiply(self.eta, tf.reduce_mean(outer_product, axis=2))
self.kernel = tf.multiply(self.kernel, self.B)
self.kernel = self.kernel * tf.diag(tf.zeros(self.output_dim))
return K.reshape(activations, x_shape)
At first inspection I expected this layer to be able to take inputs from a previous layer, perform a simple activation calculation (input * weight), update the weights according to Hebbian updating (something like - if activation is high b/t nodes, increase weight), then pass the activations to the next layer.
I also expected that it would be able to deal with decreasing/increasing the number of nodes from one layer to the next.
Instead, I cannot seem to figure out why the input and output dims must be the same and why the diagonals of the weight matrix are set to zero.
Where in the code (implicitly or explicitly) is the specification that the layers need to be the same dims?
Where in the code (implicitly or explicitly) is the specification that this layer's weight matrix is connecting the current layer to itself?
Apologies if this Q should have been separated into 2, but it seems like they may be related to e/o so I kept them as 1.
Happy to provide more details if needed.
Edit: Realized I forgot to add the error message I get when I try to create a layer with different output dims than the input dims:
model = Sequential()
model.add(Hebbian(input_shape = (256,1), output_dim = 256))
This compiles w/o error ^
model = Sequential()
model.add(Hebbian(input_shape = (256,1), output_dim = 24))
This ^ throws the error:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 256 but corresponding boolean dimension is 24
Okay I think I maybe figured it out, sort of. There were many small problems but the biggest thing was I needed to add the compute_output_shape function which makes the layer able to modify the shape of its input as explained here:
https://keras.io/layers/writing-your-own-keras-layers/
So here is the code with all the changes I made. It will compile and modify the input shape just fine. Note that this layer computes weight changes inside the layer itself and there may be some issues with that if you try to actually use the layer (I'm still ironing these out), but this is a separate issue.
class Hebbian(Layer):
def __init__(self, output_dim, lmbda=1.0, eta=0.0005, connectivity='random', connectivity_prob=0.25, **kwargs):
'''
Constructor for the Hebbian learning layer.
args:
output_dim - The shape of the output / activations computed by the layer.
lambda - A floating-point valued parameter governing the strength of the Hebbian learning activation.
eta - A floating-point valued parameter governing the Hebbian learning rate.
connectivity - A string which determines the way in which the neurons in this layer are connected to
the neurons in the previous layer.
'''
self.output_dim = output_dim
self.lmbda = lmbda
self.eta = eta
self.connectivity = connectivity
self.connectivity_prob = connectivity_prob
super(Hebbian, self).__init__(**kwargs)
def random_conn_init(self, shape, dtype=None):
A = np.random.normal(0, 1, shape)
A[self.B] = 0
return tf.constant(A, dtype=tf.float32)
def zero_init(self, shape, dtype=None):
return np.zeros(shape)
def build(self, input_shape):
# create weight variable for this layer according to user-specified initialization
if self.connectivity == 'random':
self.B = np.random.random(input_shape[0]) < self.connectivity_prob
elif self.connectivity == 'zero':
self.B = np.zeros(self.output_dim)
if self.connectivity == 'all':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer='uniform', trainable=False)
elif self.connectivity == 'random':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer=self.random_conn_init, trainable=False)
elif self.connectivity == 'zero':
self.kernel = self.add_weight(name='kernel', shape=(np.prod(input_shape[1:]), \
np.prod(self.output_dim)), initializer=self.zero_init, trainable=False)
else:
raise NotImplementedError
# call superclass "build" function
super(Hebbian, self).build(input_shape)
def call(self, x): # x is the input to the network
x_shape = tf.shape(x)
batch_size = tf.shape(x)[0]
# reshape to (batch_size, product of other dimensions) shape
x = tf.reshape(x, (tf.reduce_prod(x_shape[1:]), batch_size))
# compute activations using Hebbian-like update rule
activations = x + self.lmbda * tf.matmul(self.kernel, x)
# compute outer product of activations matrix with itself
outer_product = tf.matmul(tf.expand_dims(x, 1), tf.expand_dims(x, 0))
# update the weight matrix of this layer
self.kernel = self.kernel + tf.multiply(self.eta, tf.reduce_mean(outer_product, axis=2))
self.kernel = tf.multiply(self.kernel, self.B)
return K.reshape(activations, x_shape)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
If anyone comes here from Google (like me; repeatedly) trying to make a layer that learns online when called on new input, I just found this other question and I think it's relevant:
Persistent Variable in keras Custom Layer
Self.call is only called when you are defining the graph, for learning to happen on every new input you need to add self.add_update to the call function.
I have used Keras and TensorFlow to classify the Fashion MNIST following this tutorial .
It uses the AdamOptimizer to find the value for model parameters that minimize the loss function of the network. The input for the network is a 2-D tensor with shape [28, 28], and output is a 1-D tensor with shape [10] which is the result of a softmax function.
Once the network has been trained, I want to use the optimizer for another task: find an input that maximizes one of the elements of the output tensor. How can this be done? Is it possible to do so using Keras or one have to use a lower level API?
Since the input is not unique for a given output, it would be even better if we could impose some constraints on the values the input can take.
The trained model has the following format
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
I feel you would want to backprop with respect to the input freezing all the weights to your model. What you could do is:
Add a dense layer after the input layer with the same dimensions as input and set it as trainable
Freeze all the other layers of your model. (except the one you added)
As an input, feed an identity matrix and train your model based on whatever output you desire.
This article and this post might be able to help you if you want to backprop based on the input instead. It's a bit like what you are aiming for but you can get the intuition.
It would be very similar to the way that filters of a Convolutional Network is visualized: we would do gradient ascent optimization in input space to maximize the response of a particular filter.
Here is how to do it: after training is finished, first we need to specify the output and define a loss function that we want to maximize:
from keras import backend as K
output_class = 0 # the index of the output class we want to maximize
output = model.layers[-1].output
loss = K.mean(output[:,output_class]) # get the average activation of our desired class over the batch
Next, we need to take the gradient of the loss we have defined above with respect to the input layer:
grads = K.gradients(loss, model.input)[0] # the output of `gradients` is a list, just take the first (and only) element
grads = K.l2_normalize(grads) # normalize the gradients to help having an smooth optimization process
Next, we need to define a backend function that takes the initial input image and gives the values of loss and gradients as outputs, so that we can use it in the next step to implement the optimization process:
func = K.function([model.input], [loss, grads])
Finally, we implement the gradient ascent optimization process:
import numpy as np
input_img = np.random.random((1, 28, 28)) # define an initial random image
lr = 1. # learning rate used for gradient updates
max_iter = 50 # number of gradient updates iterations
for i in range(max_iter):
loss_val, grads_val = func([input_img])
input_img += grads_val * lr # update the image based on gradients
Note that, after this process is finished, to display the image you may need to make sure that all the values in the image are in the range [0, 255] (or [0,1]).
After the hints Saket Kumar Singh gave in his answer, I wrote the following that seems to solve the question.
I create two custom layers. Maybe Keras offers already some classes that are equivalent to them.
The first on is a trainable input:
class MyInputLayer(keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyInputLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.output_dim,
initializer='uniform',
trainable=True)
super(MyInputLayer, self).build(input_shape)
def call(self, x):
return self.kernel
def compute_output_shape(self, input_shape):
return self.output_dim
The second one gets the probability of the label of interest:
class MySelectionLayer(keras.layers.Layer):
def __init__(self, position, **kwargs):
self.position = position
self.output_dim = 1
super(MySelectionLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MySelectionLayer, self).build(input_shape)
def call(self, x):
mask = np.array([False]*x.shape[-1])
mask[self.position] = True
return tf.boolean_mask(x, mask,axis=1)
def compute_output_shape(self, input_shape):
return self.output_dim
I used them in this way:
# Build the model
layer_flatten = keras.layers.Flatten(input_shape=(28, 28))
layerDense1 = keras.layers.Dense(128, activation=tf.nn.relu)
layerDense2 = keras.layers.Dense(10, activation=tf.nn.softmax)
model = keras.Sequential([
layer_flatten,
layerDense1,
layerDense2
])
# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
# ...
# Freeze the model
layerDense1.trainable = False
layerDense2.trainable = False
# Build another model
class_index = 7
layerInput = MyInputLayer((1,784))
layerSelection = MySelectionLayer(class_index)
model_extended = keras.Sequential([
layerInput,
layerDense1,
layerDense2,
layerSelection
])
# Compile it
model_extended.compile(optimizer=tf.train.AdamOptimizer(),
loss='mean_absolute_error')
# Train it
dummyInput = np.ones((1,1))
target = np.ones((1,1))
model_extended.fit(dummyInput, target,epochs=300)
# Retrieve the weights of layerInput
layerInput.get_weights()[0]
Interesting. Maybe a solution would be to feed all your data to the network and for each sample save the output_layer after softmax.
This way, for 3 classes, where you want to find the best input for class 1, you are looking for outputs where the first component is high. For example: [1 0 0]
Indeed the output means the probability, or the confidence of the network, for the sample being one of the classes.
Funny coincident I was just working on the same "problem". I'm interested in the direction of adversarial training etc. What I did was to insert a LocallyConnected2D Layer after the input and then train with data which is all one and has as targets the class of interest.
As model I use
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (28, 28, 1)
inp = tf.keras.layers.Input(shape=input_shape)
conv1 = tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal')(inp)
pool1 = tf.keras.layers.MaxPool2D((2, 2))(conv1)
drop1 = tf.keras.layers.Dropout(0.20)(pool1)
flat = tf.keras.layers.Flatten()(drop1)
fc1 = tf.keras.layers.Dense(128, activation='relu')(flat)
norm1 = tf.keras.layers.BatchNormalization()(fc1)
dropfc1 = tf.keras.layers.Dropout(0.25)(norm1)
out = tf.keras.layers.Dense(num_classes, activation='softmax')(dropfc1)
model = tf.keras.models.Model(inputs = inp , outputs = out)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
model.summary()
after training I insert the new layer
def insert_intermediate_layer_in_keras(model,position, before_layer_id):
layers = [l for l in model.layers]
if(before_layer_id==0) :
x = new_layer
else:
x = layers[0].output
for i in range(1, len(layers)):
if i == before_layer_id:
x = new_layer(x)
x = layers[i](x)
else:
x = layers[i](x)
new_model = tf.keras.models.Model(inputs=layers[0].input, outputs=x)
return new_model
def fix_model(model):
for l in model.layers:
l.trainable=False
fix_model(model)
new_layer = tf.keras.layers.LocallyConnected2D(1, kernel_size=(1, 1),
activation='linear',
kernel_initializer='he_normal',
use_bias=False)
new_model = insert_intermediate_layer_in_keras(model,new_layer,1)
new_model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
and finally rerun training with my fake data.
X_fake = np.ones((60000,28,28,1))
print(Y_test.shape)
y_fake = np.ones((60000))
Y_fake = tf.keras.utils.to_categorical(y_fake, num_classes)
new_model.fit(X_fake, Y_fake, epochs=100)
weights = new_layer.get_weights()[0]
imshow(weights.reshape(28,28))
plt.show()
Results are not yet satisfying but I'm confident of the approach and guess I need to play around with the optimiser.
I'm very new to Keras and I'm writing a custom layer which implements Gaussian function [exp(-(w*x-mean)^2/sigma^2) where W, mean, sigma are all randomly generated].
Below is code for the custom layer:
class Gaussian(Layer):
def __init__(self,**kwargs):
super(Gaussian, self).__init__(**kwargs)
def build(self, input_shape):
# Create trainable weights for this layer.
self.W_init = np.random.rand(1,input_shape[1])
self.W = K.variable(self.W_init, name="W")
# Create trainable means for this layer.
self.mean_init = np.random.rand(1,input_shape[1])
self.mean = K.variable(self.mean_init, name="mean")
# Create trainable sigmas for this layer.
self.sigma_init = np.random.rand(1,input_shape[1])
self.sigma = K.variable(self.sigma_init, name="sigma")
self.trainable_weights = [self.mean, self.sigma]
super(Gaussian, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
result = tf.multiply(x, self.W)
result = tf.subtract(x, self.mean)
result = tf.multiply(tf.square(result),-1)
result = tf.divide(result, tf.square(self.sigma))
return result
def compute_output_shape(self, input_shape):
return input_shape
After putting it as the first layer in a Keras mnist tutorial(just wanted to make sure it runs without producing errors, didn't care for accuracy) and training the model, it appeared that the loss stopped decreasing after around 4 epochs and only the numbers of "mean" and "sigma" changed after training while the numbers of "W" remains the same. However, this doesn't happen if I put it as the second layer.
I ran the Keras mnist tutorial again without the custom layer and found out that the weights of the first layer didn't change either.
Is not updating the weights of first layer(more specifically the very first parameter) a Keras thing or am I missing something? Can I force it to update?
Thank you!
You are not implementing your layer correctly, Keras is not aware of your weights, that means they are not being trained by gradient descent. Take a look at this example:
from keras import backend as K
from keras.engine.topology import Layer
import numpy as np
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
Here you have to use add_weight to obtain a trainable weight, not just use K.variable as you are currently doing. This way your weights will be registered with Keras and they will be trained properly. You should do this for all trainable parameters in your layer.