I would like to create a custom loss function that has a weight term that's updated based on what epoch I'm in.
For example:
Let's say I have a loss function which has a beta weight, where beta increases over the first 20 epochs...
def custom_loss(x, x_pred):
loss1 = objectives.binary_crossentropy(x, x_pred)
loss2 = objectives.mse(x, x_pred)
return (beta*current_epoch/20) * loss1 + loss2
How could I implement something like this into a keras loss function?
Looking at their documentation they mention that you can use theano/Tf symbolic functions that return a scalar for each data point.
So you could do something like this
loss = tf.contrib.losses.softmax_cross_entropy(x, x_pred) *
(beta * current_epoch / 20 ) +
tf.contrib.losses.mean_squared_error
You would have to pass x and x_pred as x and x_pred as tf.placeholders
I think for model creation you could use keras but then again you would have to run the computational graph with sess.run()
References:
https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#using-keras-models-with-tensorflow
Related
I'm currently implementing a custom loss function by modelling it as a Keras backend tensor. The loss function has different parts (such as a classification loss, a quantization loss or a pairwise distance loss)
The code looks something like this:
...
different_class_loss = K.log(1 + K.exp(-1*dissimilarity + margin))
pair_loss = same_class * same_class_loss + (1-same_class) * different_class_loss
loss_value = lambda_pair * pair_loss + lambda_classif * classif_loss + lambda_quant_binary * quantization_loss
# Add loss to model
pairwise_model.add_loss(loss_value)
# Compile without specifying a loss
pairwise_model.compile(optimizer=optimizer_adam)
When I train the model using a batch_generator and pairwise_model.fit() the history contains exactly one loss argument for the combined loss_value. For debugging purposes I'd like to monitor every part of that loss function individually (i.e .quantization, classification and pairwise distance loss), but I can't figure out how.
I tried implementing a callback using K.eval() or K.print_tensor() to retrieve the values during training, but that didn't work. I also wasn't able to add multiple loss metrics using the add_loss function.
Is there a way to do this without writing a custom training loop? It feels like there should be. Any help is greatly appreciated.
__________________________________________________
EDIT:
Following the idea from Dr. Snoopy, here is the code that ended up working for me:
...
different_class_loss = K.log(1 + K.exp(-1*dissimilarity + margin))
pair_loss = same_class * same_class_loss + (1-same_class) * different_class_loss
loss_value = lambda_pair * pair_loss + lambda_classif * classif_loss + lambda_quant_binary * quantization_loss
# Add loss to model
pairwise_model.add_loss(loss_value)
# Add additional losses as metrics
pairwise_model.add_metric(pair_loss, name = "pairwise loss")
pairwise_model.add_metric(quantization_loss, name = "quantization loss")
# Compile without specifying a loss or metrics
pairwise_model.compile(optimizer=optimizer_adam)
You can pass them as metrics, like this:
def pl():
return pair_loss
pairwise_model.compile(optimizer=optimizer_adam, metrics=[pl])
And you can do similarly for your other loss components. The function might not be needed, you could also try passing pair_loss directly as a metric.
In TF2 keras, I have trained an Autoencoder using tensorflow.keras.losses.MeanSquaredError as loss function. Now, I want to further train this model by using another loss function, specifically tensorflow.keras.losses.KLDivergence. The reason for this is that initially unsupervised learning is conducted for representation learning. Then, having the generated embeddings, I can cluster them and use these clusters for self-supervision, i.e. labels, enabling the second, supervised loss and improving the model further.
This is not transfer learning per se, as no new layers are added to the model, just the loss function is changed and the model continues training.
What I have tried is using the pretrained model with the MSE loss as the new model's property:
class ClusterBooster(tf.keras.Model):
def __init__(self, base_model, centers):
super(ClusterBooster, self).__init__()
self.pretrained = base_model
self.centers = centers
def train_step(self, data):
with tf.GradientTape() as tape:
loss = self.compiled_loss(self.P, self.Q, regularization_losses=self.losses)
# Compute gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Update weights
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
return {m.name: m.result() for m in self.metrics}
where the loss is the KL loss between distributions P and Q. The distributions are computed in a callback function instead of the model train_step as I need access to the current epoch (P is updated every 5 epochs, not on each epoch):
def on_epoch_begin(self, epoch, logs=None):
z = self.model.pretrained.embed(self.feature, training=True)
z = tf.reshape(z, [tf.shape(z)[0], 1, tf.shape(z)[1]]) # reshape for broadcasting
# CALCULATE Q FOR EVERY EPOCH
partial = tf.math.pow(tf.norm(z - self.model.centers, axis=2, ord='euclidean'), 2)
nominator = 1 / (1 + partial)
denominator = tf.math.reduce_sum(1 / (1 + partial))
self.model.Q = nominator / denominator
# CALCULATE P EVERY 5 EPOCHS TO AVOID INSTABILITY
if epoch % 5 == 0:
partial = tf.math.pow(self.model.Q, 2) / tf.math.reduce_sum(self.model.Q, axis=1, keepdims=True)
nominator = partial
denominator = tf.math.reduce_sum(partial, axis=0)
self.model.P = nominator / denominator
However, when apply_gradients() is executed I get:
ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0', 'dense_3/kernel:0', 'dense_3/bias:0']
I think that this is due to the fact that the pretrained model is not set to be further trained somewhere inside the new model (only the embed() method is called, which does not train the model). Is this a correct approach and I am just missing something or is there a better way?
It seems that whatever computation takes place in a callback, isn't tracked for gradient computation and weight updating. Thus, these computations should be put inside the train_step() function of the custom Model class (ClusterBooster).
Providing that I don't have access to the number of epochs inside the train_step() function of ClusterBooster, I created a custom training loop without a Model class, where I could use plain python code (which is computed eagerly).
I am working on creating a custom loss function in Keras.
Here is an example.
import keras.backend as K
def test(y_true, y_pred):
loss = K.square(y_pred - y_true)
loss = K.mean(loss, axis = 1)
return loss
Now in this example, I would like to only subtract let's say specific values
from y_pred, but since this is in tensorflow, how do I iterate throw them.
For example, can I iterate through y_pred to pick values? and how?
Lets say for this example, the batch size is 5.
I have tried things such as
y_pred[0...i]
tf.arange and many more...
Just pass it when you are compiling model. Like
model.compile(optimizer='sgd', loss = test)
Keras will Iterate over it automatically.
You have also intentaion error in return statement.
import keras.backend as K
def test(y_true, y_pred):
loss = K.square(y_pred - y_true)
loss = K.mean(loss, axis = 1)
return loss
def test_accuracy(y_true, y_pred):
return 1 - test(y_true, y_pred)
By this way you can pass your custom loss function to the model and you can also pass accuracy funtion similarly
model.compile(optimizer='sgd', loss = test, metrics=[test_accuracy])
I am a deep learning and Tensorflow beginner and I am trying to implement the algorithm in this paper using Tensorflow. This paper uses Matconvnet+Matlab to implement it, and I am curious if Tensorflow has the equivalent functions to achieve the same thing. The paper said:
The network parameters were initialized using the Xavier method [14]. We used the regression loss across four wavelet subbands under l2 penalty and the proposed network was trained by using the stochastic gradient descent (SGD). The regularization parameter (λ) was 0.0001 and the momentum was 0.9. The learning rate was set from 10−1 to 10−4 which was reduced in log scale at each epoch.
This paper uses wavelet transform (WT) and residual learning method (where the residual image = WT(HR) - WT(HR'), and the HR' are used for training). Xavier method suggests to initialize the variables normal distribution with
stddev=sqrt(2/(filter_size*filter_size*num_filters)
Q1. How should I initialize the variables? Is the code below correct?
weights = tf.Variable(tf.random_normal[img_size, img_size, 1, num_filters], stddev=stddev)
This paper does not explain how to construct the loss function in details . I am unable to find the equivalent Tensorflow function to set the learning rate in log scale (only exponential_decay). I understand MomentumOptimizer is equivalent to Stochastic Gradient Descent with momentum.
Q2: Is it possible to set the learning rate in log scale?
Q3: How to create the loss function described above?
I followed this website to write the code below. Assume model() function returns the network mentioned in this paper and lamda=0.0001,
inputs = tf.placeholder(tf.float32, shape=[None, patch_size, patch_size, num_channels])
labels = tf.placeholder(tf.float32, [None, patch_size, patch_size, num_channels])
# get the model output and weights for each conv
pred, weights = model()
# define loss function
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=pred)
for weight in weights:
regularizers += tf.nn.l2_loss(weight)
loss = tf.reduce_mean(loss + 0.0001 * regularizers)
learning_rate = tf.train.exponential_decay(???) # Not sure if we can have custom learning rate for log scale
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step)
NOTE: As I am a deep learning/Tensorflow beginner, I copy-paste code here and there so please feel free to correct it if you can ;)
Q1. How should I initialize the variables? Is the code below correct?
Use tf.get_variable or switch to slim (it does the initialization automatically for you). example
Q2: Is it possible to set the learning rate in log scale?
You can but do you need it? This is not the first thing that you need to solve in this network. Please check #3
However, just for reference, use following notation.
learning_rate_node = tf.train.exponential_decay(learning_rate=0.001, decay_steps=10000, decay_rate=0.98, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_node).minimize(loss)
Q3: How to create the loss function described above?
At first, you have not written "pred" to "image" conversion to this message(Based on the paper you need to apply subtraction and IDWT to obtain final image).
There is one problem here, logits have to be calculated based on your label data. i.e. if you will use marked data as "Y : Label", you need to write
pred = model()
pred = tf.matmul(pred, weights) + biases
logits = tf.nn.softmax(pred)
loss = tf.reduce_mean(tf.abs(logits - labels))
This will give you the output of Y : Label to be used
If your dataset's labeled images are denoised ones, in this case you need to follow this one:
pred = model()
pred = tf.matmul(image, weights) + biases
logits = tf.nn.softmax(pred)
image = apply_IDWT("X : input", logits) # this will apply IDWT(x_label - y_label)
loss = tf.reduce_mean(tf.abs(image - labels))
Logits are the output of your network. You will use this one as result to calculate the rest. Instead of matmul, you can add a conv2d layer in here without a batch normalization and an activation function and set output feature count as 4. Example:
pred = model()
pred = slim.conv2d(pred, 4, [3, 3], activation_fn=None, padding='SAME', scope='output')
logits = tf.nn.softmax(pred)
image = apply_IDWT("X : input", logits) # this will apply IDWT(x_label - y_label)
loss = tf.reduce_mean(tf.abs(logits - labels))
This loss function will give you basic training capabilities. However, this is L1 distance and it may suffer from some issues (check). Think following situation
Let's say you have following array as output [10, 10, 10, 0, 0] and you try to achieve [10, 10, 10, 10, 10]. In this case, your loss is 20 (10 + 10). However, you have 3/5 success. Also, it may indicate some overfit.
For same case, think following output [6, 6, 6, 6, 6]. It still has loss of 20 (4 + 4 + 4 + 4 + 4). However, whenever you apply threshold of 5, you can achieve 5/5 success. Hence, this is the case that we want.
If you use L2 loss, for the first case, you will have 10^2 + 10^2 = 200 as loss output. For the second case, you will get 4^2 * 5 = 80.
Hence, optimizer will try to run away from #1 as quick as possible to achieve global success rather than perfect success of some outputs and complete failure of the others. You can apply loss function like this for that.
tf.reduce_mean(tf.nn.l2_loss(logits - image))
Alternatively, you can check for cross entropy loss function. (it does apply softmax internally, do not apply softmax twice)
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, image))
Q1. How should I initialize the variables? Is the code below correct?
That's correct (although missing an opening parentheses). You could also look into tf.get_variable if the variables are going to be reused.
Q2: Is it possible to set the learning rate in log scale?
Exponential decay decreases the learning rate at every step. I think what you want is tf.train.piecewise_constant, and set boundaries at each epoch.
EDIT: Look at the other answer, use the staircase=True argument!
Q3: How to create the loss function described above?
Your loss function looks correct.
Other answers are very detailed and helpful. Here is a code example that uses placeholder to decay learning rate at log scale. HTH.
import tensorflow as tf
import numpy as np
# data simulation
N = 10000
D = 10
x = np.random.rand(N, D)
w = np.random.rand(D,1)
y = np.dot(x, w)
print y.shape
#modeling
batch_size = 100
tni = tf.truncated_normal_initializer()
X = tf.placeholder(tf.float32, [batch_size, D])
Y = tf.placeholder(tf.float32, [batch_size,1])
W = tf.get_variable("w", shape=[D,1], initializer=tni)
B = tf.zeros([1])
lr = tf.placeholder(tf.float32)
pred = tf.add(tf.matmul(X,W), B)
print pred.shape
mse = tf.reduce_sum(tf.losses.mean_squared_error(Y, pred))
opt = tf.train.MomentumOptimizer(lr, 0.9)
train_op = opt.minimize(mse)
learning_rate = 0.0001
do_train = True
acc_err = 0.0
sess = tf.Session()
sess.run(tf.global_variables_initializer())
while do_train:
for i in range (100000):
if i > 0 and i % N == 0:
# epoch done, decrease learning rate by 2
learning_rate /= 2
print "Epoch completed. LR =", learning_rate
idx = i/batch_size + i%batch_size
f = {X:x[idx:idx+batch_size,:], Y:y[idx:idx+batch_size,:], lr: learning_rate}
_, err = sess.run([train_op, mse], feed_dict = f)
acc_err += err
if i%5000 == 0:
print "Average error = {}".format(acc_err/5000)
acc_err = 0.0
I am trying to implement a custom objective function in keras frame.
Respectively a weighted average function that takes the two arguments tensors y_true and y_pred ; the weights information is derived from y_true tensor.
Is there a weighted average function in tensorflow ?
Or any other suggestions on how to implement this kind of loss function ?
My function would look something like this:
function(y_true,y_pred)
A=(y_true-y_pred)**2
w - derivable from y_true, tensor of same shape as y_true
return average(A, weights=w) <-- a scalar
y_true and y_pred are 3D tensors.
you can use one of the existing objectives (also called loss) on keras from here.
you may also implement your own custom function loss:
from keras import backend as K
def my_loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
# Let's train the model using RMSprop
model.compile(loss=my_loss, optimizer='SGD', metrics=['accuracy'])
notice the K module, its the keras backend you should use to fully utilize keras performance, dont do something like this unless you dont care from performance issues:
def my_bad_and_slow_loss(y_true, y_pred):
return sum((y_pred - y_true) ** 2, axis=-1)
for your specific case, please write your desired objective function if you need help to write it.
Update
you can try this to provide weights - W as loss function:
def my_loss(y_true, y_pred):
W = np.arange(9) / 9. # some example W
return K.mean(K.pow(y_true - y_pred, 2) * W)