I'm new to Keras and neural networks in general. I'm trying to implement a custom loss function based on mean squared error for a multi-layer autoencoder to be used in anomaly detection. Basically the approach I'm going for is from here https://www.jstage.jst.go.jp/article/ipsjjip/27/0/27_335/_pdf
Unfortunately I don't have the reputation to post images as I'm also new to SO but the formula is on page 2, section 3 as Lprop
The intuition here is that I don't want the autoencoder to update weights for data points that return errors above the ap percentile of losses. This way it learns to reconstruct the inliers in the dataset while struggling with the outliers, hence detecting them as anomalous.
Here's some code I've tried and the compiled model
import keras.backend as K
c = 70.0
def mean_squared_errorx(y_true, y_pred):
es = K.square(y_pred - y_true)
const = np.percentile(es, c)
w = K.cast(K.less(const, K.mean(K.square(y_pred - y_true), axis=-1)), dtype = "float32")
return w * K.mean(K.square(y_pred - y_true), axis=-1)
#'mean_squared_error'
autoencoder.compile(optimizer=adam, loss=mean_squared_errorx)
autoencoder.fit(train, train,
epochs=num_epochs,
batch_size=round(len(train)/50),
shuffle=True,
validation_data=(train, train),
verbose = 0)
encoded_d = encoder.predict(train)
decoded_pred = decoder.predict(encoded_d)
The idea is to get the K.less to return a bool for each error, and then to convert it to a float to serve as a weight in the return statement.
I know the np.percentile part probably won't work on a Tensor but don't know how else to accomplish the percentile ranking.
With that code I'm getting this error message
InvalidArgumentError: Incompatible shapes: [37,21] vs. [37]
[[{{node loss_25/dense_104_loss/Less}}]]
where in this case the batch size is 37 and the number of features is 21. I appreciate any feedback on this or other parts of the code - thanks!
Found a potential workaround if anybody is working on something similar
import keras.backend as K
def mean_squared_error_w(y_true, y_pred):
mses = K.mean(K.square(y_pred - y_true), axis = -1)
std_of_mses = K.std(mses)
const = K.mean(mses, axis = -1) + (std_of_mses * 0.5)
mask = K.cast(K.less(K.mean(K.square(y_pred - y_true), axis=-1), const), dtype = "float32")
return mask * K.mean(K.square(y_pred - y_true), axis=-1)
I believe this will create a tensor of bools for all of the values where the error
is larger than a threshold value, defined by the mean of the batch MSEs plus half a standard deviation (if the errors were normally distributed this should correspond to about the 70th percentile of the data as the cutoff). It converts the bools to the weights 0 or 1 as a mask which is then applied to the output MSE loss
Related
I am using the Image segmentation guide by fchollet to perform semantic segmentation. I have attempted modifying the guide to suit my dataset by labelling the 8-bit img mask values into 1 and 2 like in the Oxford Pets dataset. (which will be subtracted to 0 and 1 in class OxfordPets(keras.utils.Sequence):)
Question is how do I get the IoU metric of a single class (e.g 1)?
I have tried different metrics suggested by Stack Overflow but most of suggest using MeanIoU which I tried but I have gotten nan loss as a result. Here is an example of a mask after using autocontrast.
PIL.ImageOps.autocontrast(load_img(val_target_img_paths[i]))
The model seems to train well but the accuracy was decreasing over time.
Also, can someone help explain how the metric score can be calculated from y_true and y_pred? I don't quite fully understand when the label value is used in the IoU metric calculation.
I had a similar problem back then. I used jaccard_distance_loss and dice_metric. They are based on IoU. My task was a binary segmentation, so I guess you might have to modify the code in case you want to use it for a multi-label classification problem.
from keras import backend as K
def jaccard_distance_loss(y_true, y_pred, smooth=100):
"""
Jaccard = (|X & Y|)/ (|X|+ |Y| - |X & Y|)
= sum(|A*B|)/(sum(|A|)+sum(|B|)-sum(|A*B|))
The jaccard distance loss is usefull for unbalanced datasets. This has been
shifted so it converges on 0 and is smoothed to avoid exploding or disapearing
gradient.
Ref: https://en.wikipedia.org/wiki/Jaccard_index
#url: https://gist.github.com/wassname/f1452b748efcbeb4cb9b1d059dce6f96
#author: wassname
"""
intersection = K.sum(K.sum(K.abs(y_true * y_pred), axis=-1))
sum_ = K.sum(K.sum(K.abs(y_true) + K.abs(y_pred), axis=-1))
jac = (intersection + smooth) / (sum_ - intersection + smooth)
return (1 - jac) * smooth
def dice_metric(y_pred, y_true):
intersection = K.sum(K.sum(K.abs(y_true * y_pred), axis=-1))
union = K.sum(K.sum(K.abs(y_true) + K.abs(y_pred), axis=-1))
# if y_pred.sum() == 0 and y_pred.sum() == 0:
# return 1.0
return 2*intersection / union
# Example
size = 10
y_true = np.zeros(shape=(size,size))
y_true[3:6,3:6] = 1
y_pred = np.zeros(shape=(size,size))
y_pred[3:5,3:5] = 1
loss = jaccard_distance_loss(y_true,y_pred)
metric = dice_metric(y_pred,y_true)
print(f"loss: {loss}")
print(f"dice_metric: {metric}")
loss: 4.587155963302747
dice_metric: 0.6153846153846154
You can use the tf.keras.metrics.IoU method, and you can find its documentation here: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/IoU
And how you can use it is shown here:
import tensorflow as tf
y_true = [0, 1]
y_pred = [0.5, 1.0] # they must be the same shape
# target_class_ids indicates the class/classes you want to calculate IoU on
loss = tf.keras.metrics.IoU(num_classes=2, target_class_ids=[1])
loss.update_state(y_true, y_pred)
print(loss.result().numpy())
And as you can see in the documentation, IoU is calculated by:
true_positives / (true_positives + false_positives + false_negatives)
When defining a custom loss function for a classification problem, is there a way to access particular elements of y_true and y_pred?
Use-case: multi-label classification problem where I wanna penalize the model extra if I predict a false positive for class 5 i.e. y_true[5] == 0 but y_pred[5] == 1
I'm defining the loss like:
def loss(y_true, y_pred):
wt = 10 if (y_true[5]==0 and y_pred[5]==1) else 1
return wt * binary_crossentropy(y_true, y_pred)
I also tried to check if K.gather(y_true, 5) == 0 but that doesn't seem to do it.
My batch size is > 1 (256) and i'm using fit_generator - if that makes any difference. Thanks!
Is there a way to access particular elements of y_true and y_pred?
The indexing of Keras tensors works similarly to the indexing of numpy arrays. The only difference is that the result is a Keras tensor. Therefore, you should use Keras operations subsequently.
Possible implementation of your loss function
For example, here is how your loss function might be implemented:
def loss(y_true, y_pred):
a = K.equal(y_true[:, 5], 0)
b = K.greater(y_pred[:, 5], 0.5)
condition = K.cast(a, 'float') * K.cast(b, 'float')
wt = 10 * condition + (1 - condition)
return K.mean(wt[:, None] * K.binary_crossentropy(y_true, y_pred), axis=-1)
NOTE: Not tested.
I'm currently trying to implement a siamese-net in Keras where I have to implement the following loss function:
loss(p ∥ q) = Is · KL(p ∥ q) + Ids · HL(p ∥ q)
detailed description of loss function from paper
Where KL is the Kullback-Leibler divergence and HL is the Hinge-loss.
During training, I label same-speaker pairs as 1, different speakers as 0.
The goal is to use the trained net to extract embeddings from spectrograms.
A spectrogram is a 2-dimensional numpy-array 40x128 (time x frequency)
The problem is I never get over 0.5 accuracy, and when clustering speaker-embeddings the results show there seems to be no correlation between embeddings and speakers
I implemented the kb-divergence as distance measure, and adjusted the hinge-loss accordingly:
def kullback_leibler_divergence(vects):
x, y = vects
x = ks.backend.clip(x, ks.backend.epsilon(), 1)
y = ks.backend.clip(y, ks.backend.epsilon(), 1)
return ks.backend.sum(x * ks.backend.log(x / y), axis=-1)
def kullback_leibler_shape(shapes):
shape1, shape2 = shapes
return shape1[0], 1
def kb_hinge_loss(y_true, y_pred):
"""
y_true: binary label, 1 = same speaker
y_pred: output of siamese net i.e. kullback-leibler distribution
"""
MARGIN = 1.
hinge = ks.backend.mean(ks.backend.maximum(MARGIN - y_pred, 0.), axis=-1)
return y_true * y_pred + (1 - y_true) * hinge
A single spectrogram would be fed into a branch of the base network, the siamese-net consists of two such branches, so two spectrograms are fed simultaneously, and joined in the distance-layer. The output of the base network is 1 x 128. The distance layer computes the kullback-leibler divergence and its output is fed into the kb_hinge_loss. The architecture of the base-network is as follows:
def create_lstm(units: int, gpu: bool, name: str, is_sequence: bool = True):
if gpu:
return ks.layers.CuDNNLSTM(units, return_sequences=is_sequence, input_shape=INPUT_DIMS, name=name)
else:
return ks.layers.LSTM(units, return_sequences=is_sequence, input_shape=INPUT_DIMS, name=name)
def build_model(mode: str = 'train') -> ks.Model:
topology = TRAIN_CONF['topology']
is_gpu = tf.test.is_gpu_available(cuda_only=True)
model = ks.Sequential(name='base_network')
model.add(
ks.layers.Bidirectional(create_lstm(topology['blstm1_units'], is_gpu, name='blstm_1'), input_shape=INPUT_DIMS))
model.add(ks.layers.Dropout(topology['dropout1']))
model.add(ks.layers.Bidirectional(create_lstm(topology['blstm2_units'], is_gpu, is_sequence=False, name='blstm_2')))
if mode == 'extraction':
return model
num_units = topology['dense1_units']
model.add(ks.layers.Dense(num_units, name='dense_1'))
model.add(ks.layers.advanced_activations.PReLU(init='zero', weights=None))
model.add(ks.layers.Dropout(topology['dropout2']))
num_units = topology['dense2_units']
model.add(ks.layers.Dense(num_units, name='dense_2'))
model.add(ks.layers.advanced_activations.PReLU(init='zero', weights=None))
num_units = topology['dense3_units']
model.add(ks.layers.Dense(num_units, name='dense_3'))
model.add(ks.layers.advanced_activations.PReLU(init='zero', weights=None))
num_units = topology['dense4_units']
model.add(ks.layers.Dense(num_units, name='dense_4'))
model.add(ks.layers.advanced_activations.PReLU(init='zero', weights=None))
return model
I then build a siamese net as follows:
base_network = build_model()
input_a = ks.Input(shape=INPUT_DIMS, name='input_a')
input_b = ks.Input(shape=INPUT_DIMS, name='input_b')
processed_a = base_network(input_a)
processed_b = base_network(input_b)
distance = ks.layers.Lambda(kullback_leibler_divergence,
output_shape=kullback_leibler_shape,
name='distance')([processed_a, processed_b])
model = ks.Model(inputs=[input_a, input_b], outputs=distance)
adam = build_optimizer()
model.compile(loss=kb_hinge_loss, optimizer=adam, metrics=['accuracy'])
Lastly, I build a net with the same architecture with only one input, and try to extract embeddings, and then build the mean over them, where an embedding should serve as a representation for a speaker, to be used during clustering:
utterance_embedding = np.mean(embedding_extractor.predict_on_batch(spectrogram), axis=0)
We train the net on the voxceleb speaker set.
The full code can be seen here: GitHub repo
I'm trying to figure out if I have made any wrong assumptions and how to improve my accuracy.
Issue with accuracy
Notice that in your model:
y_true = labels
y_pred = kullback-leibler divergence
These two cannot be compared, see this example:
For correct results, when y_true == 1 (same
speaker), Kullback-Leibler is y_pred == 0 (no divergence).
So it's totally expected that metrics will not work properly.
Then, either you create a custom metric, or you count only on the loss for evaluations.
This custom metric should need a few adjustments in order to be feasible, as explained below.
Possible issues with the loss
Clipping
This might be a problem
First, notice that you're using clip in the values for the Kullback-Leibler. This may be bad because clips lose the gradients in the clipped regions. And since your activation is a PRelu, you have values lower than zero and bigger than 1. Then there are certainly zero gradient cases here and there, with the risk of having a frozen model.
So, you might not want to clip these values. And to avoid having negative values with the PRelu, you can try to use a 'softplus' activation, which is kind of a soft relu without negative values. You might also "sum" an epsilon to avoid trouble, but there is no problem in leaving values bigger than one:
#considering you used 'softplus' instead of 'PRelu' in speakers
def kullback_leibler_divergence(speakers):
x, y = speakers
x = x + ks.backend.epsilon()
y = y + ks.backend.epsilon()
return ks.backend.sum(x * ks.backend.log(x / y), axis=-1)
Assimetry in Kullback-Leibler
This IS a problem
Notice also that Kullback-Leibler is not a symetric function, and also doesn't have its minimum at zero!! The perfect match is zero, but bad matches can have lower values, and this is bad for a loss function because it will drive you to divergence.
See this picture showing KB's graph
Your paper states that you should sum two losses: (p||q) and (q||p).
This eliminates the assimetry and also the negative values.
So:
distance1 = ks.layers.Lambda(kullback_leibler_divergence,
name='distance1')([processed_a, processed_b])
distance2 = ks.layers.Lambda(kullback_leibler_divergence,
name='distance2')([processed_b, processed_a])
distance = ks.layers.Add(name='dist_add')([distance1,distance2])
Very low margin and clipped hinge
This might be a problem
Finally, see that the hinge loss also clips values below zero!
Since Kullback-Leibler is not limited to 1, samples with high divergency may not be controled by this loss. Not sure if this really an issue, but you might want to either:
increase the margin
inside the Kullback-Leibler, use mean instead of sum
use a softplus in hinge instead of a max, to avoid losing gradients.
See:
MARGIN = someValue
hinge = ks.backend.mean(ks.backend.softplus(MARGIN - y_pred), axis=-1)
Now we can think of a custom accuracy
This is not very easy, since we don't have clear limits on KB that tells us "correct/not correct"
You might try one at random, but you'd need to tune this threshold parameter until you find a good thing that represents reality. You may for instance use your validation data to find the threshold that brings the best accuracy.
def customMetric(y_true_targets, y_pred_KBL):
isMatch = ks.backend.less(y_pred_KBL, threshold)
isMatch = ks.backend.cast(isMatch, ks.backend.floatx())
isMatch = ks.backend.equal(y_true_targets, isMatch)
isMatch = ks.backend.cast(isMatch, ks.backend.floatx())
return ks.backend.mean(isMatch)
I need to create a custom loss function in Keras and depending on the result of the conditional return two different loss values. I am having trouble getting the if statement to run properly.
I need to do something similar to this:
def custom_loss(y_true, y_pred):
sees = tf.Session()
const = 2
if (sees.run(tf.keras.backend.less(y_pred, y_true))): #i.e. y_pred - y_true < 0
return const * mean_squared_error(y_true, y_pred)
else:
return mean_squared_error(y_true, y_pred)
I keep getting tensor errors (see below) when trying to run this. Any help/advice will be appreciated!
InvalidArgumentError: You must feed a value for placeholder tensor 'dense_63_target' with dtype float and shape [?,?]
[[Node: dense_63_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
You should instead multiply simply by a mask in order to get your desired function
import keras.backend as K
def custom_1loss(y_true, y_pred):
const = 2
mask = K.less(y_pred, y_true) #i.e. y_pred - y_true < 0
return (const - 1) * mask * mean_squared_error(y_true, y_pred) + mean_squared_error(y_true, y_pred)
which has the same desired output as when y_pred is an under-prediction, another MSE term is added. You may have to cast the mask to an integer tensor - I do not remember what specific types - but it would be a minor change.
Also as unsolicited advice to your approach in general. I think you would get better results with a different approach to loss.
import keras.backend as K
def custom_loss2(y_true, y_pred):
beta = 0.1
return mean_squared_error(y_true, y_pred) + beta*K.mean(y_true - y_pred)
observe the difference in gradient behavior:
https://www.desmos.com/calculator/uubwgdhpi6
the second loss function I show you shifts the moment of the local minimum to be a minor over prediction rather than an under prediction (based on what you want). The loss function you give still locally optimizes to mean 0 but with different strength gradients. This will most likely result in simply a slower convergence to the same result as MSE rather than desiring a model that would rather over-predict then under predict. I hope this makes sense.
I am trying to implement a custom objective function in keras frame.
Respectively a weighted average function that takes the two arguments tensors y_true and y_pred ; the weights information is derived from y_true tensor.
Is there a weighted average function in tensorflow ?
Or any other suggestions on how to implement this kind of loss function ?
My function would look something like this:
function(y_true,y_pred)
A=(y_true-y_pred)**2
w - derivable from y_true, tensor of same shape as y_true
return average(A, weights=w) <-- a scalar
y_true and y_pred are 3D tensors.
you can use one of the existing objectives (also called loss) on keras from here.
you may also implement your own custom function loss:
from keras import backend as K
def my_loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
# Let's train the model using RMSprop
model.compile(loss=my_loss, optimizer='SGD', metrics=['accuracy'])
notice the K module, its the keras backend you should use to fully utilize keras performance, dont do something like this unless you dont care from performance issues:
def my_bad_and_slow_loss(y_true, y_pred):
return sum((y_pred - y_true) ** 2, axis=-1)
for your specific case, please write your desired objective function if you need help to write it.
Update
you can try this to provide weights - W as loss function:
def my_loss(y_true, y_pred):
W = np.arange(9) / 9. # some example W
return K.mean(K.pow(y_true - y_pred, 2) * W)