Hi I have been trying to make a custom loss function in keras for dice_error_coefficient. It has its implementations in tensorboard and I tried using the same function in keras with tensorflow but it keeps returning a NoneType when I used model.train_on_batch or model.fit where as it gives proper values when used in metrics in the model. Can please someone help me out with what should i do? I have tried following libraries like Keras-FCN by ahundt where he has used custom loss functions but none of it seems to work. The target and output in the code are y_true and y_pred respectively as used in the losses.py file in keras.
def dice_hard_coe(target, output, threshold=0.5, axis=[1,2], smooth=1e-5):
"""References
-----------
- `Wiki-Dice <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient>`_
"""
output = tf.cast(output > threshold, dtype=tf.float32)
target = tf.cast(target > threshold, dtype=tf.float32)
inse = tf.reduce_sum(tf.multiply(output, target), axis=axis)
l = tf.reduce_sum(output, axis=axis)
r = tf.reduce_sum(target, axis=axis)
hard_dice = (2. * inse + smooth) / (l + r + smooth)
hard_dice = tf.reduce_mean(hard_dice)
return hard_dice
There are two steps in implementing a parameterized custom loss function in Keras. First, writing a method for the coefficient/metric. Second, writing a wrapper function to format things the way Keras needs them to be.
It's actually quite a bit cleaner to use the Keras backend instead of tensorflow directly for simple custom loss functions like DICE. Here's an example of the coefficient implemented that way:
import keras.backend as K
def dice_coef(y_true, y_pred, smooth, thresh):
y_pred = y_pred > thresh
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
Now for the tricky part. Keras loss functions must only take (y_true, y_pred) as parameters. So we need a separate function that returns another function.
def dice_loss(smooth, thresh):
def dice(y_true, y_pred)
return -dice_coef(y_true, y_pred, smooth, thresh)
return dice
Finally, you can use it as follows in Keras compile.
# build model
model = my_model()
# get the loss function
model_dice = dice_loss(smooth=1e-5, thresh=0.5)
# compile model
model.compile(loss=model_dice)
According to the documentation, you can use a custom loss function like this:
Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.
As a simple example:
def my_loss_fn(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
return tf.reduce_mean(squared_difference, axis=-1) # Note the `axis=-1`
model.compile(optimizer='adam', loss=my_loss_fn)
Complete example:
import tensorflow as tf
import numpy as np
def my_loss_fn(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
return tf.reduce_mean(squared_difference, axis=-1) # Note the `axis=-1`
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1)])
model.compile(optimizer='adam', loss=my_loss_fn)
x = np.random.rand(1000)
y = x**2
history = model.fit(x, y, epochs=10)
In addition, you can extend an existing loss function by inheriting from it. For example masking the BinaryCrossEntropy:
class MaskedBinaryCrossentropy(tf.keras.losses.BinaryCrossentropy):
def call(self, y_true, y_pred):
mask = y_true != -1
y_true = y_true[mask]
y_pred = y_pred[mask]
return super().call(y_true, y_pred)
A good starting point is the custom log guide: https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
Related
I want to apply augmented conventional cross-entropy loss with an additional loss term as follows:
y_pred = softmax(Wh_t / tau)
augmented_loss = D_KL(y_true || y_pred)
total_loss = crossentropy_loss + alpha*augmented_loss
Where alpha and tau are hyperparameters. Here, the augmented loss is similar to the crossentroypy loss but the difference is that before the logits(Wh_t) passing to the softmax function, the logits is divided by hyperparameter tau.
So I want to define a customised loss using logits instead of probability. I have read the tensorflow tutorial like this:
def custom_mean_squared_error(y_true, y_pred):
return tf.math.reduce_mean(tf.square(y_true - y_pred))
class CustomMSE(keras.losses.Loss):
def __init__(self, regularization_factor=0.1, name="custom_mse"):
super().__init__(name=name)
self.regularization_factor = regularization_factor
def call(self, y_true, y_pred):
mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
return mse + reg * self.regularization_factor
But the y_pred is the probability but not logits. I have spent one day to read the source code and look through every related blogs but not find the anwser.
Here is my model which customize the conventional crossentropy loss:
class Customloss(keras.losses.Loss):
def __init__(self, name="custom_crossentropy"):
super().__init__(name=name)
def call(self, y_true, y_pred):
return -tf.reduce_sum(tf.math.reduce_mean(tf.math.log(y_pred)*y_true, 0))
model = keras.Sequential([
layers.Embedding(input_dim=cfg.vocab_size, output_dim=cfg.embedding_size,
input_length=cfg.seq_length),
layers.Bidirectional(layers.LSTM(200, return_sequences=True)),
layers.Bidirectional(layers.LSTM(200)),
layers.Dense(cfg.hidden_unit, activation='relu'),
layers.Dropout(cfg.keep_prob),
layers.Dense(cfg.num_classes, activation='softmax')
])
model.compile(optimizer=keras.optimizers.Adam(),
loss=Customloss,
metrics=['accuracy',
keras.metrics.Recall()])
Can anyone help me define a custom loss function using logits in tensorflow?
Thank you so much!
I am currently experimenting with generative adversarial networks in Keras.
As proposed in this paper, I want to use the historical averaging loss function. Meaning that I want to penalize the change of the network weights.
I am not sure how to implement it in a clever way.
I was implementing the custom loss function according to the answer to this post.
def historical_averaging_wrapper(current_weights, prev_weights):
def historical_averaging(y_true, y_pred):
diff = 0
for i in range(len(current_weights)):
diff += abs(np.sum(current_weights[i]) + np.sum(prev_weights[i]))
return K.binary_crossentropy(y_true, y_pred) + diff
return historical_averaging
The weights of the network are penalized, and the weights are changing after each batch of data.
My first idea was to update the loss function after each batch.
Roughly like this:
prev_weights = model.get_weights()
for i in range(len(data)/batch_len):
current_weights = model.get_weights()
model.compile(loss=historical_averaging_wrapper(current_weights, prev_weights), optimizer='adam')
model.fit(training_data[i*batch_size:(i+1)*batch_size], training_labels[i*batch_size:(i+1)*batch_size], epochs=1, batch_size=batch_size)
prev_weights = current_weights
Is this reasonable? That approach seems to be a bit "messy" in my opinion.
Is there another possibility to do this in a "smarter" way?
Like maybe updating the loss function in a data generator and use fit_generator()?
Thanks in advance.
Loss functions are operations on the graph using tensors.
You can define additional tensors in the loss function to hold previous values. This is an example:
import tensorflow as tf
import tensorflow.keras.backend as K
keras = tf.keras
class HistoricalAvgLoss(object):
def __init__(self, model):
# create tensors (initialized to zero) to hold the previous value of the
# weights
self.prev_weights = []
for w in model.get_weights():
self.prev_weights.append(K.variable(np.zeros(w.shape)))
def loss(self, y_true, y_pred):
err = keras.losses.mean_squared_error(y_true, y_pred)
werr = [K.mean(K.abs(c - p)) for c, p in zip(model.get_weights(), self.prev_weights)]
self.prev_weights = K.in_train_phase(
[K.update(p, c) for c, p in zip(model.get_weights(), self.prev_weights)],
self.prev_weights
)
return K.in_train_phase(err + K.sum(werr), err)
The variable prev_weights holds the previous values. Note that we added a K.update operation after the weight errors are calculated.
A sample model for testing:
model = keras.models.Sequential([
keras.layers.Input(shape=(4,)),
keras.layers.Dense(8),
keras.layers.Dense(4),
keras.layers.Dense(1),
])
loss_obj = HistoricalAvgLoss(model)
model.compile('adam', loss_obj.loss)
model.summary()
Some test data and objective function:
import numpy as np
def test_fn(x):
return x[0]*x[1] + 2.0 * x[1]**2 + x[2]/x[3] + 3.0 * x[3]
X = np.random.rand(1000, 4)
y = np.apply_along_axis(test_fn, 1, X)
hist = model.fit(X, y, validation_split=0.25, epochs=10)
The model losses decrease over time, in my test.
I am trying to build a custom loss function that takes the previous output(output from the previous iteration) from the network and use it with the current output.
Here is what I am trying to do, but I don't know how to complete it
def l_loss(prev_output):
def loss(y_true, y_pred):
pix_loss = K.mean(K.square(y_pred - y_true), axis=-1)
pase = K.variable(100)
diff = K.mean(K.abs(prev_output - y_pred))
movement_loss = K.abs(pase - diff)
total_loss = pix_loss + movement_loss
return total_loss
return loss
self.model.compile(optimizer=Adam(0.001, beta_1=0.5, beta_2=0.9),
loss=l_loss(?))
I hope you can help me.
This is what I tried:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
from tensorflow.keras import backend as K
class MovementLoss(object):
def __init__(self):
self.var = None
def __call__(self, y_true, y_pred, sample_weight=None):
mse = K.mean(K.square(y_true - y_pred), axis=-1)
if self.var is None:
z = np.zeros((32,))
self.var = K.variable(z)
delta = K.update(self.var, mse - self.var)
return mse + delta
def make_model():
model = Sequential()
model.add(Dense(1, input_shape=(4,)))
loss = MovementLoss()
model.compile('adam', loss)
return model
model = make_model()
model.summary()
Using an example test data.
import numpy as np
X = np.random.rand(32, 4)
POLY = [1.0, 2.0, 0.5, 3.0]
def test_fn(xi):
return np.dot(xi, POLY)
Y = np.apply_along_axis(test_fn, 1, X)
history = model.fit(X, Y, epochs=4)
I do see the loss function oscillate in a way that appears to me is influenced by the last batch delta. Note that the loss function details are not according to your application.
The crucial step is that the K.update step must be part of the graph (as far as I understand it).
That is achieved by:
delta = K.update(var, delta)
return x + delta
I am trying to code up an implementation of the variational autoencoder, however I am facing some difficulties regarding the loss function:
def vae_loss(sigma, mu):
def loss(y_true, y_pred):
recon = K.sum(K.binary_crossentropy(y_true, y_pred), axis=-1)
kl = 0.5 * K.sum(K.exp(sigma) + K.square(mu) - 1. - sigma, axis=-1)
return recon + kl
return loss
The binary crossentropy part works fine, but whenever I return only the divergence term kl for testing I get the following error:
ValueError: "Tried to convert 'x' to a tensor and failed. Error: None values not supported.".
I am looking forward to possible hints as to what I have done wrong. You will find my entire code below. Thank you for your time!
import numpy as np
from keras import Model
from keras.layers import Input, Dense, Lambda
import keras.backend as K
from keras.datasets import mnist
from matplotlib import pyplot as plt
class VAE(object):
def __init__(self, n_latent, batch_size):
self.encoder, self.encoder_input, self.mu, self.sigma = self.create_encoder(n_latent, batch_size)
self.decoder, self.decoder_input, self.decoder_output = self.create_decoder(n_latent, batch_size)
pipeline = self.decoder(self.encoder.outputs[0])
def vae_loss(sigma, mu):
def loss(y_true, y_pred):
recon = K.sum(K.binary_crossentropy(y_true, y_pred), axis=-1)
kl = 0.5 * K.sum(K.exp(sigma) + K.square(mu) - 1. - sigma, axis=-1)
return recon + kl
return loss
self.VAE = Model(self.encoder_input, pipeline)
self.VAE.compile(optimizer="adadelta", loss=vae_loss(self.sigma, self.mu))
def create_encoder(self, n_latent, batch_size):
input_layer = Input(shape=(784,))
#net = Dense(512, activation="relu")(input_layer)
mu = Dense(n_latent, activation="linear")(input_layer)
print(mu)
sigma = Dense(n_latent, activation="linear")(input_layer)
def sample_z(args):
mu, log_sigma = args
eps = K.random_normal(shape=(K.shape(input_layer)[0], n_latent), mean=0., stddev=1.)
K.print_tensor(K.shape(eps))
return mu + K.exp(log_sigma / 2) * eps
sample_z = Lambda(sample_z)([mu, sigma])
model = Model(inputs=input_layer, outputs=[sample_z, mu, sigma])
return model, input_layer, mu, sigma
def create_decoder(self, n_latent, batch_size):
input_layer = Input(shape=(n_latent,))
#net = Dense(512, activation="relu")(input_layer)
reconstruct = Dense(784, activation="linear")(input_layer)
model = Model(inputs=input_layer, outputs=reconstruct)
return model, input_layer, reconstruct
I am going to assume the error appears when you are "testing"/debugging your training phase, during backpropagation (let me if I am wrong).
If so, the problem is that you are asking Keras to optimize your whole network (model.VAE.fit(...)) while using a loss (kl) covering only the encoder part. The gradients for the decoder stay undefined (without a loss like recon covering it), causing the optimization error.
For your debugging purpose, the error would disappear if you try to compile and fit only the encoder with this amputated loss (kl), or if you come up with a dummy (differentiable) loss covering also the decoder (e.g. K.sum(y_pred - y_pred, axis=-1) + kl).
I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model.
My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong?
I have already tried updating to latest keras and theano versions.
The code for my model is:
def create_lstm_nn(input_dim):
seq = Sequential()`
# embedd using pretrained 300d embedding
seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights]))
# encode via LSTM
seq.add(LSTM(128))
seq.add(Dropout(0.3))
return seq
lstm_nn = create_lstm_nn(input_dim)
input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
processed_a = lstm_nn(input_a)
processed_b = lstm_nn(input_b)
cos_distance = merge([processed_a, processed_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
distance = Lambda(lambda x: 1-x)(cos_distance)
model = Model(input=[input_a, input_b], output=distance)
# train
rms = RMSprop()
model.compile(loss='mse', optimizer=rms)
model.fit([X1, X2], y, validation_split=0.3, batch_size=128, nb_epoch=20)
I also tried using a simple Lambda instead of the Merge layer, but it has the same result.
def cosine_distance(vests):
x, y = vests
x = K.l2_normalize(x, axis=-1)
y = K.l2_normalize(y, axis=-1)
return -K.mean(x * y, axis=-1, keepdims=True)
def cos_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0],1)
distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
The nan is a common issue in deep learning regression. Because you are using Siamese network, you can try followings:
check your data: do they need to be normalized?
try to add an Dense layer into your network as the last layer, but be careful picking up an activation function, e.g. relu
try to use another loss function, e.g. contrastive_loss
smaller your learning rate, e.g. 0.0001
cos mode does not carefully deal with division by zero, might be the cause of NaN
It is not easy to make deep learning work perfectly.
I didn't run into the nan issue, but my loss wouldn't change. I found this info
check this out
def cosine_distance(shapes):
y_true, y_pred = shapes
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.sign(x) * K.maximum(K.abs(x), K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return K.mean(1 - K.sum((y_true * y_pred), axis=-1))