add_update() in my custom Keras layer doesn't update the weights - python

So I'm implementing Center loss: https://ydwen.github.io/papers/WenECCV16.pdf and I am having problem with updating weights in my layer, which here means updating centers in Center loss. When I print my class_centers like this tf.print(self.class_centers, summarize=-1, output_stream='file:///tensors.txt') than they never change. When I print other Variables they seem fine, so the only problem I can think of is that add_update() doesn't do what it should do.
The custom layer:
class CenterLossLayer(Layer):
def __init__(self, alpha=0.5, **kwargs):
self.alpha = alpha
super(CenterLossLayer, self).__init__(**kwargs)
def build(self, input_shape):
print('Center loss input 1 (feature_size): ', input_shape[0][1])
print('Center loss input 2 (num_classes): ', input_shape[1][1])
self.class_centers = self.add_weight(name='class_centers',
shape=(input_shape[1][1], input_shape[0][1]),
initializer='uniform',
trainable=False)
super(CenterLossLayer, self).build(input_shape)
def call(self, x, mask=None):
embeddings, one_hots = x
tf.print(self.class_centers, summarize=-1, output_stream='file:///tensors.txt')
batch_centers = K.dot(one_hots, self.class_centers)
batch_delta = batch_centers - embeddings
class_delta = K.dot(K.transpose(one_hots), batch_delta)
counts = K.sum(K.transpose(one_hots), axis=1, keepdims=True) + 1
class_delta = class_delta / counts
class_delta = K.in_train_phase(self.alpha * class_delta, 0 * class_delta)
updated_class_centers = self.class_centers - class_delta
self.add_update((self.class_centers, updated_class_centers), x[0])
losses = K.sum(K.square(embeddings - batch_centers), axis=1, keepdims=True)
return losses
def compute_output_shape(self, input_shape):
return (input_shape[1][0], )
and the final loss is:
def batch_mean_loss(y_true, y_pred):
return K.mean(y_pred, axis=0)
where y_pred is losses from CenterLossLayer.
The weird thing is that even thought the centers are not updating, the center loss is going down with each epoch and the final model is better that the one trained only with Softmax loss.

So I checked out how add_update() is used in BatchNormalization layer:
self.add_update([K.moving_average_update(self.moving_mean,
mean,
self.momentum),
K.moving_average_update(self.moving_variance,
variance,
self.momentum)],
inputs)
The thing is that the first argument of method add_update() is "updates: Update op" and moving_average_update() returns "An operation to update the variable.". So I guess that add_update() requires some sort of operation and moving_average_update() returns that. I don't know how to create this operation, so instead I did:
self.add_update(K.moving_average_update(self.class_centers, updated_class_centers, 0.0), x)
so it functions as just replacing self.class_centers with updated_class_centers and it works.
Even thought it works, I would appreciate if anyone knows how to do this properly.

Looks like you should do something like this:
class ComputeSum(keras.layers.Layer):
def __init__(self, input_dim):
super(ComputeSum, self).__init__()
self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)
def call(self, inputs):
self.total.assign_add(tf.reduce_sum(inputs, axis=0))
return self.total
Snippet got from https://keras.io/guides/making_new_layers_and_models_via_subclassing/#layers-can-have-nontrainable-weighto

Related

How to create a linear combination layer in Keras? [duplicate]

I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:
class WeightedAverage(tf.keras.layers.Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
init_value = tf.keras.initializers.Constant(value=0.5)
self.w = self.add_weight(name="weight",
initializer=init_value,
trainable=True)
def call(self, inputs):
return tf.keras.layers.average([inputs[0] * self.w,
inputs[1] * (1 - self.w)])
Now the problem is that after training the model, saving, and loading it again, the value for w remains 0.5. Is it possible that the parameter does not receive any gradient updates? When printing the trainable variables of my model, the parameter is listed and should therefore be included when calling model.fit.
Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario
here the custom layer:
class WeightedAverage(Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
def build(self, input_shape):
self.W = self.add_weight(
shape=(1,1,len(input_shape)),
initializer='uniform',
dtype=tf.float32,
trainable=True)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
here the full example in a regression problem:
inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)
m = Model([inp1,inp2], out)
m.compile('adam','mse')
n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))
m.fit([X1,X2], y, epochs=10)
in the end, you can also visualize the value of the weights in this way:
tf.nn.softmax(m.get_weights()[-3]).numpy()

Does adding a forward hook to a layer of ensure that the gradient of the loss calculated using the layer's output will be calculated automatically?

I have a model
class NewModel(nn.Module):
def __init__(self,output_layer,*args):
self.output_layer = output_layer
super().__init__(*args)
self.output_layer = output_layer
self.selected_out = None
#PRETRAINED MODEL
self.pretrained = models.resnet18(pretrained=True)
#TAKING OUTPUT FROM AN INTERMEDIATE LAYER
#self._layers = []
for l in list(self.pretrained._modules.keys()):
#self._layers.append(l)
if l == self.output_layer:
handle = getattr(self.pretrained,l).register_forward_hook(self.hook)
def hook(self,module, input,output):
self.selected_out = output
def forward(self, x):
return x = self.pretrained(x)
I have two target outputs, one which is same as any label of an image and the second one is the same dimensions as the output obtained from self.output_layer, called target_feature
out = model(img)
layerout = model.selected_out
Now, if I want to calculate the loss of layerout with the target feature map, can it be done like the line written below?
loss = criterion(y_true, out) + feature_criterion(layerout, target_feature)
Or do I need to add backward_hooks?
In this Kaggle notebook
https://www.kaggle.com/sironghuang/understanding-pytorch-hooks
it is written that loss.backward() cannot be used when using backward_hooks.
Quoting the author
# backprop once to get the backward hook results
out.backward(torch.tensor([1,1],dtype=torch.float),retain_graph=True)
#! loss.backward(retain_graph=True) # doesn't work with backward hooks,
#! since it's not a network layer but an aggregated result from the outputs of last layer vs target
Then how can be gradient be calculated based on the loss function?
If I understand you correctly, you want to get two outputs from your model, calculate two losses, then combine them and backpropagate. I imagine you come from Tensorflow & Keras from the way you tried implementing it. In Pytorch, it's actually fairly straight foward, you can do this very easily because of its purely functional aspect.
This is just an example:
class NewModel(nn.Module):
def __init__(self, output_layer, *args):
super(MyModel, self).__init__()
self.pretrained = models.resnet18(pretrained=True)
self.output_layer = output_layer
def forward(self, x):
out = self.pretrained(x)
features = self.output_layer(out)
return out, features
On inference, you will get two results per call:
>>> m = NewModel(nn.Linear(1000, 10))
>>> x = torch.rand(16, 3, 224, 224)
>>> y_pred, y_feature = m(x)
Call you loss functions:
>>> loss = criterion(y_pred, y_true) + feature_criterion(y_feature, target_feature)
Then, backpropagate with loss.backward().
So no need for hooks, nor complicated gradient on your .backward call!
Edit - If you wish to extract an intermediate layer output, keep the hook, that's good. And just modify the forward definition.
def forward(self, x):
out = self.pretrained(x)
return out, self.selected_out
For example:
>>> m = NewModel(output_layer='layer1')
>>> x = torch.rand(16, 3, 224, 224)
>>> y_pred, y_feature = m(x)
>>> y_pred.shape, y_feature.shape
(torch.Size([16, 1000]), torch.Size([16, 64, 56, 56]))
Also, what I said above about the loss stills stands. Compute your loss, then call loss.backward().

Weighted Average: Custom layer weights don't change in TensorFlow 2.2.0

I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:
class WeightedAverage(tf.keras.layers.Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
init_value = tf.keras.initializers.Constant(value=0.5)
self.w = self.add_weight(name="weight",
initializer=init_value,
trainable=True)
def call(self, inputs):
return tf.keras.layers.average([inputs[0] * self.w,
inputs[1] * (1 - self.w)])
Now the problem is that after training the model, saving, and loading it again, the value for w remains 0.5. Is it possible that the parameter does not receive any gradient updates? When printing the trainable variables of my model, the parameter is listed and should therefore be included when calling model.fit.
Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario
here the custom layer:
class WeightedAverage(Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
def build(self, input_shape):
self.W = self.add_weight(
shape=(1,1,len(input_shape)),
initializer='uniform',
dtype=tf.float32,
trainable=True)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
here the full example in a regression problem:
inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)
m = Model([inp1,inp2], out)
m.compile('adam','mse')
n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))
m.fit([X1,X2], y, epochs=10)
in the end, you can also visualize the value of the weights in this way:
tf.nn.softmax(m.get_weights()[-3]).numpy()

Pytorch parameters won't update with custom loss function (Pytorch)

I am trying to use the optimizer to tune a set of parameters for a cost function that includes, among other things, a forward pass across a neural network. The parameters specify the means and variances of the weights of this neural network. However, when updating the parameters at every iteration of the optimization process, all the terms of the cost function except the one belonging to the forward pass contribute to the parameter update. That is, if all other terms are commented out, no parameters will update. Are there any ways of fixing this issue?
EDIT: I added a contrived example below.
import torch
class TestNN(torch.nn.Module):
def __init__(self):
super(TestNN, self).__init__()
self.fc1 = torch.nn.Linear(10, 1)
def forward(self, x):
x = self.fc1(x)
return x
def getParameters(self):
return [self.fc1.weight.transpose(0, 1), self.fc1.bias]
def setParameters(self, parameters):
# Can anything be done here to keep parameters in the graph?
weight, bias = parameters
self.fc1.weight = torch.nn.Parameter(weight.transpose(0, 1))
self.fc1.bias = torch.nn.Parameter(bias)
def computeCost(parameters, input):
testNN = TestNN()
testNN.setParameters(parameters)
cost = testNN(input) ** 2
print(cost) # Cost stays the same :(
return cost
def minimizeLoss(maxIter, optimizer, lossFunc, lossFuncArgs):
for i in range(maxIter):
optimizer.zero_grad()
loss = lossFunc(*lossFuncArgs)
loss.backward(retain_graph = True)
optimizer.step()
if i % 100 == 0:
print(loss)
input = torch.randn(1, 10)
weight = torch.ones(10, 1)
bias = torch.ones(1, 1)
parameters = (weight, bias)
lossArgs = (parameters, input)
optimizer = torch.optim.Adam(parameters, lr = 0.01)
minimizeLoss(10, optimizer, computeCost, lossArgs)

Keras weights of first layer didn't change

I'm very new to Keras and I'm writing a custom layer which implements Gaussian function [exp(-(w*x-mean)^2/sigma^2) where W, mean, sigma are all randomly generated].
Below is code for the custom layer:
class Gaussian(Layer):
def __init__(self,**kwargs):
super(Gaussian, self).__init__(**kwargs)
def build(self, input_shape):
# Create trainable weights for this layer.
self.W_init = np.random.rand(1,input_shape[1])
self.W = K.variable(self.W_init, name="W")
# Create trainable means for this layer.
self.mean_init = np.random.rand(1,input_shape[1])
self.mean = K.variable(self.mean_init, name="mean")
# Create trainable sigmas for this layer.
self.sigma_init = np.random.rand(1,input_shape[1])
self.sigma = K.variable(self.sigma_init, name="sigma")
self.trainable_weights = [self.mean, self.sigma]
super(Gaussian, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
result = tf.multiply(x, self.W)
result = tf.subtract(x, self.mean)
result = tf.multiply(tf.square(result),-1)
result = tf.divide(result, tf.square(self.sigma))
return result
def compute_output_shape(self, input_shape):
return input_shape
After putting it as the first layer in a Keras mnist tutorial(just wanted to make sure it runs without producing errors, didn't care for accuracy) and training the model, it appeared that the loss stopped decreasing after around 4 epochs and only the numbers of "mean" and "sigma" changed after training while the numbers of "W" remains the same. However, this doesn't happen if I put it as the second layer.
I ran the Keras mnist tutorial again without the custom layer and found out that the weights of the first layer didn't change either.
Is not updating the weights of first layer(more specifically the very first parameter) a Keras thing or am I missing something? Can I force it to update?
Thank you!
You are not implementing your layer correctly, Keras is not aware of your weights, that means they are not being trained by gradient descent. Take a look at this example:
from keras import backend as K
from keras.engine.topology import Layer
import numpy as np
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
Here you have to use add_weight to obtain a trainable weight, not just use K.variable as you are currently doing. This way your weights will be registered with Keras and they will be trained properly. You should do this for all trainable parameters in your layer.

Categories

Resources