Writing a custom loss function without y_true in Keras - python

I'm implementing a triplet loss function in Keras. In general, loss functions take predicted values with ground truth as arguments. But triplet loss doesn't use labels, just the output. I tried to write the function with just one parameter:
def triplet_loss(y_pred):
margin = 1
return K.mean(K.square(y_pred[0]) - K.square(y_pred[1]) + margin)
It failed saying triplet_loss() takes 1 argument but two arguments were given (in score_array = fn(y_true, y_pred). When I write the function with two arguments y_true, y_pred, the program runs without error. Why is that? Should I just implement this function with these two arguments although y_true won't be used? Is this correct or is there another way of doing it?

Well.... simply don't use the ground truth:
def triplet_loss(y_true,y_pred):
#all your code as it is.
It's not very usual to have networks trained without ground truth. When we expect it to learn something, there is very very often a ground truth. If you don't, simply ignore it.
Also, if y_true is ignored, what are you passing to the fit method? Just a dummy array?

implement it via K.function method.
output_tensor = your_model(input_tensor)
total_loss = K.mean( K.abs (input_tensor - output_tensor) )
nn_train = K.function ([input_tensor],[total_loss],
Adam(lr=5e-5, beta_1=0.5, beta_2=0.999).get_updates(total_loss, your_model.trainable_weights)
loss, = nn_train ([input])

Related

Check failed: 1 == NumElements() (1 vs. 2)Must have a one element tensor (Neural network metric)

I have my Neural network int TF2 and for that I want to make my own metric. In my function I iterate throw each tensor value and canlculate new value into output_list. That I will stack as my new y_pred and throw it into mean_absolute_error. Compilaction is OK, but in first iteration I get error in the title. What am I doing wrong?
#tf.function
def custom_metric_mae( y_true , y_pred ):
output_list=tf.TensorArray(dtype=tf.float32, size=tf.shape(y_pred))
for i in range(223):
dphi = abs(y_true[i][0]-y_pred[i][0])
if(dphi > 0.5):
output_list.write(i,1 - dphi)
else:
output_list.write(i,dphi)
y_PredChanged = output_list.stack()
return tf.metrics.mean_absolute_error(y_true , y_PredChanged)
My model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(32,32)),
keras.layers.Dense(64,activation="relu"),
keras.layers.Dense(32,activation="relu"),
keras.layers.Dense(16,activation="relu"),
keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer="adam",loss = "mean_absolute_error",metrics=[custom_metric_mae])
From the documentation of Keras' custom metrics:
The function would need to take (y_true, y_pred) as arguments and return a single tensor value.
tf.metrics.mean_absolute_error returns two values: the actual MAE and an update_op that, once evaluated, updates the running values and returns the MAE value. I'm not entirely sure whether this is compatible with Keras, I'd suggest to replace it with keras.metrics.mae instead.
Do note that when using Keras's mae, you need to average the final result (the function is applied to the last dimension of y_true and y_pred and the result has shape y_true.shape[:-1] which, in general, won't be one value). To do so, use tf.math.reduce_mean:
return tf.math.reduce_mean(keras.metrics.mae(y_true , y_PredChanged))

How can I predict the expected value and the variance simultaneously with a neural network?

I'd like to use a neural network to predict a scalar value which is the sum of a function of the input values and a random value (I'm assuming gaussian distribution) whose variance also depends on the input values. Now I'd like to have a neural network that has two outputs - the first output should approximate the deterministic part - the function, and the second output should approximate the variance of the random part, depending on the input values. What loss function do I need to train such a network?
(It would be nice if there was an example with Python for Tensorflow, but I'm also interested in general answers. I'm also not quite clear how I could write something like in Python code - none of the examples I found so far show how to address individual outputs from the loss function.)
You can use dropout for that. With a dropout layer you can make several different predictions based on different settings of which nodes dropped out. Then you can simply count the outcomes and interpret the result as a measure for uncertainty.
For details, read:
Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.
Since I've found nothing simple to implement, I wrote something myself, that models that explicitly: here is a custom loss function that tries to predict mean and variance. It seems to work but I'm not quite sure how well that works out in practice, and I'd appreciate feedback. This is my loss function:
def meanAndVariance(y_true: tf.Tensor , y_pred: tf.Tensor) -> tf.Tensor :
"""Loss function that has the values of the last axis in y_true
approximate the mean and variance of each value in the last axis of y_pred."""
y_pred = tf.convert_to_tensor(y_pred)
y_true = math_ops.cast(y_true, y_pred.dtype)
mean = y_pred[..., 0::2]
variance = y_pred[..., 1::2]
res = K.square(mean - y_true) + K.square(variance - K.square(mean - y_true))
return K.mean(res, axis=-1)
The output dimension is twice the label dimension - mean and variance of each value in the label. The loss function consists of two parts: a mean squared error that has the mean approximate the mean of the label value, and the variance that approximates the difference of the value from the predicted mean.
When using dropout to estimate the uncertainty (or any other stochastic regularization method), make sure to also checkout our recent work on providing a sampling-free approximation of Monte-Carlo dropout.
https://arxiv.org/pdf/1908.00598.pdf
We essentially follow ur idea. Treat the activations as random variables and then propagate mean and variance using error propagation to the output layer. Consequently, we obtain two outputs - the mean and the variance.

Implementing Intersection over Union Loss Using Tensorflow

This may be more of a Tensorflow gradient question. I have been attempting to implement Intersection over Union (IoU) as losses and have been running into some problems. To the point, here is the snippet of my code that computes the IoU:
def get_iou(masks, predictions):
ious = []
for i in range(batch_size):
mask = masks[i]
pred = predictions[i]
masks_sum = tf.reduce_sum(mask)
predictions_sum = tf.reduce_mean(pred)
intersection = tf.reduce_sum(tf.multiply(mask, pred))
union = masks_sum + predictions_sum - intersection
iou = intersection / union
ious.append(iou)
return ious
iou = get_iou(masks, predictions)
mean_iou_loss = -tf.log(tf.reduce_sum(iou))
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
It works as predicted. However, the issue that I am having is the losses do not decrease. The model does train, though the results are less than ideal so I am wondering if I am implementing it correctly. Do I have to compute the gradients myself? I can compute the gradients for this IoU loss derived by this paper using tf.gradients(), though I am not sure how to incorporate that with the tf.train.AdamOptimizer(). Reading the documentation, I feel like compute_gradients and apply_gradients are the commands that I need to use, but I can't find any examples on how to use them. My understanding is that the Tensorflow graph should be able to come up with the gradient itself via chain rule. So is a custom gradient even necessary in this problem? If the custom gradient is not necessary then I may just have an ill-posed problem and need to adjust some hyperparameters.
Note: I have tried Tensorflow's implementation of the IoU, tf.metrics.mean_iou(), but it spits out inf every time so I have abandoned that.
Gradient computation occurs inside optimizer.minimize function, so, no explicit use inside loss function is needed. However, your implementation simply lacks an optimizable, trainable variable.
iou = get_iou(masks, predictions)
mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
Numerical stability, differentiability and particular implementation aside, this should be enough to use it as a loss function, which will change with iterations.
Also take a look:
https://arxiv.org/pdf/1902.09630.pdf
Why does one not use IOU for training?

How to obtain the gradient of the categorical_cross_entropy on Keras using Tensorflow backend?

I'm trying to obtain the gradient of the loss objective, in my case categorical_cross_entropy w.r.t to NN parameters such as 'weights' and 'bias'.
The reason for this is I want to implement a callback function with the above as the base, with which I could debug the model while it's training.
So, here's the problem.
I'm currently using generator methods to fit, evaluate and predict on the dataset.
The categorical_cross_entropy loss function in Keras is implemented as follows:
def categorical_crossentropy(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred)
The only way I can get my hands on y_pred is if I evaluate/predict at the end of training my model.
So, what I'm asking is the following:
Is there a way for me to create a callback as mentioned above?
If anyone already has implemented a callback like the one above using categorical_cross_entropy, please let me know how to make it work?
Lastly, how to compute the numeric gradient for the same?
Currently, this is the code I'm using to calculate the gradient. But, I've no clue if this is right/wrong. Link.
def symbolic_gradients(model, input, output):
grads = K.gradients(model.total_loss, model.trainable_weights)
inputs = model.model._feed_inputs + model.model._feed_targets +
model.model._feed_sample_weights
fn = K.function(inputs, grads)
return fn([input, output, np.ones(len(output))])
Ideally I'd like to make this model-agnostic, but even if it's not, it's okay.
I can help with gradient part. I am using this function to calculate gradient of the loss function w.r.t output.
def get_loss_grad(model, inputs, outputs):
x, y, sample_weight = model._standardize_user_data(inputs, outputs)
grad_ce = K.gradients(model.total_loss, model.output)
func = K.function((model._feed_inputs + model._feed_targets + model._feed_sample_weights), grad_ce)
return func(x + y + sample_weight)

Use generator in loss function

I need to incorporate additional information into a Keras loss function that depends on the current batch. Since Keras losses only take two arguments, I considered adding this information by making the loss function call next() on a generator object. However, the generator is only called once (probably when adding the loss function in model.compile()).
Here is a sample code:
data_batches = yield_data_batches()
meta_batches = yield_meta_batches()
....
model.compile(loss=loss_function, ...)
model.fit_generator(generator=data_batches, ....)
def loss_function(x, y):
meta_x, meta_y = next(meta_batches)
x *= meta_x # component-wise matrix multiplication
y *= meta_y # component-wise matrix multiplication
return mse(x, y)
Is there a way to make the loss function get a new meta_batch each time it is evaluated on a data_batch? Or is there another way to incorporate this meta information into the loss function?
Clarification:
The meta_x and meta_y are binary matrices that should cancel out certain elements from the prediction as they should not count to the loss.
For example:
y_true = (a,b,c,0)
y_pred = (d,e,f,g)
y_meta = (1,1,1,0)
Now, y_pred*y_meta should cancel out g so that it does not count to the loss.
This does not work, since the loss function will be compiled and added to the compute graph. Your loss function may only depend on y_pred and y_true.
You can either incorporate this information in y_true, or weight the resulting loss with sample weights.
Your approach would be equivalent to a combination of both:
Assuming positive weights a and b (you call them meta_x and meta_y):
|ax-by| = a|x-b/a*y|, so you just weight y_pred by b/a and add the sample weight a.

Categories

Resources