I wrote a squared loss function for categorisation of one hot encoded data
def squared_categorical_loss(y_true, y_pred):
return K.mean(K.square(1.0 - K.sum(y_true * y_pred, axis=(1))))
which works when given numpy array examples, such as
y_true = np.asarray([[1,0,0],[0,1,0]])
y_pred = np.asarray([[0.5,0.2,0.3],[0.4,0.6,0]])
squared_categorical_loss(y_true, y_pred)
The example above returns a tensor with the value 0.205 which is the mean of (1-0.5)^2 and (1-0.6)^2, which is the desired result and what should be an optimisable loss function that generally correlates with accuracy but when I apply it to a TensorFlow model
model.compile(optimizer='adam',
loss=squared_categorical_loss,
metrics=['accuracy'])
the loss decreases to extremely small values while the training accuracy stays below 50% which shouldn't be possible as a loss below 0.125 couldn't be mathematically achieved without the accuracy being above 50% so what is wrong with my implementation?
Thanks!
It will work only if y_pred is normalized (sum equals to 1).
I think that you forgot to apply softmax in the last layer of your model.
Related
Is there a way to have two loss functions in Keras in which the second loss function takes the output from the first loss function?
I am working on a Neural Network with Keras and I want to add another custom function to the Loss term inside the model.compile() to regularize and somehow penalize it, which is the form:
model.compile(loss_1='mean_squared_error', optimizer=Adam(lr=learning_rate), metrics=['mae'])
I would like to add another loss function as a sum of the predicted values from the Loss_1 outputs so that I can tell the Neural Network to minimize the sum of the predicted values from the Loss_1 model. How can I do that (loss_2)?
Something like:
model.compile(loss_1='mean_squared_error', loss_2= np.sum(****PREDICTED_OUTPUT_FROM_LOSS_FUNCTION_1****), optimizer=Adam(lr=learning_rate), metrics=['mae'])
how can this be implemented?
You should define a custom loss function
def custom_loss_function(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
absolute_difference = tf.abs(y_true - y_pred)
loss = tf.reduce_mean(squared_difference, axis=-1) + tf.reduce_mean(absolute_difference, axis=-1)
return loss
model.compile(optimizer='adam', loss=custom_loss_function)
I believe that would solve your problem
I have
y_true = 16
and
y_pred = array([1.1868494e-08, 1.8747659e-09, 1.2777099e-11, 3.6140797e-08,
6.5852622e-11, 2.2888577e-10, 1.4515833e-09, 2.8392664e-09,
4.7054605e-10, 9.5605066e-11, 9.3647139e-13, 2.6149302e-10,
2.5338919e-14, 4.8815413e-10, 3.9381631e-14, 2.1434269e-06,
9.9999785e-01, 3.0857247e-08, 1.3536775e-09, 4.6811921e-10,
3.0638234e-10, 2.0818169e-09, 2.9950772e-10, 1.0457132e-10,
3.2959850e-11, 3.4232595e-10, 5.1689473e-12], dtype=float32)
When I use tf.keras.losses.categorical_crossentropy(to_categorical(y_true,num_classes=27),y_pred,from_logits=True)
The loss value I get is 2.3575358.
But if I use the formula for categorical cross entropy to get the loss value
-np.sum(to_categorical(gtp_out_true[0],num_classes=27)*np.log(gtp_pred[0]))
according to the formula
I get the value 2.1457695e-06
Now, my question is, why does the function tf.keras.losses.categorical_crossentropy give different value.
The strange thing is that, my model gives 100% accuracy even though the loss is stuck at 2.3575.
Below is the image of the plot of accuracy and losses during training.
What formula does Tensorflow use to calculate categorical cross-entropy?
Found where the problem is
I used softmax activation in my last layer
output = Dense(NUM_CLASSES, activation='softmax')(x)
But I used from_logits=True in tf.keras.losses.categorical_crossentropy, which resulted in softmax being applied again on the output of the last layer (which was already softmax(logits)). So, the output argument that I was passing to the loss function was softmax(softmax(logits)).
Hence, the anomaly in the values of loss.
When using softmax as activation in the last layer, we should use from_logits=False
y_pred as a probability vector so you should not use from_logits=True. Set it to False and you get:
>>> print(categorical_crossentropy(to_categorical(16, num_classes = 27),
y_pred, from_logits = False).numpy())
2.264979e-06
The reason it is not equal to the expected 2.1457695e-06 is, I belive, because y_pred[16] is very close to 1.0 and categorical_crossentropy adds some smoothing.
See the answer here for a discussion on logits: What is the meaning of the word logits in TensorFlow?
You can also use the sparse version of the function if each input value can only have one label:
print(sparse_categorical_crossentropy(16, y_pred))
I'm training a CNN architecture to solve a regression problem using PyTorch where my output is a tensor of 20 values. I planned to use RMSE as my loss function for the model and tried to use PyTorch's nn.MSELoss() and took the square root for it using torch.sqrt() for that but got confused after obtaining the results.I'll try my best to explain why. It's obvious that for a batch-size bs my output tensor's dimensions would be [bs , 20].I tried to implement and RMSE function of my own :
def loss_function (predicted_x , target ):
loss = torch.sum(torch.square(predicted_x - target) , axis= 1)/(predicted_x.size()[1]) #Taking the mean of all the squares by dividing it with the number of outputs i.e 20 in my case
loss = torch.sqrt(loss)
loss = torch.sum(loss)/predicted_x.size()[0] #averaging out by batch-size
return loss
But the output of my loss_function() and how PyTorch implements it with nn.MSELoss() differed . I'm not sure whether my implementation is wrong or am I using nn.MSELoss() in the wrong way.
The MSE loss is the mean of the squares of the errors. You're taking the square-root after computing the MSE, so there is no way to compare your loss function's output to that of the PyTorch nn.MSELoss() function — they're computing different values.
However, you could just use the nn.MSELoss() to create your own RMSE loss function as:
loss_fn = nn.MSELoss()
RMSE_loss = torch.sqrt(loss_fn(prediction, target))
RMSE_loss.backward()
Hope that helps.
To replicate the default PyTorch's MSE (Mean-squared error) loss function, you need to change your loss_function method to the following:
def loss_function (predicted_x , target ):
loss = torch.sum(torch.square(predicted_x - target) , axis= 1)/(predicted_x.size()[1])
loss = torch.sum(loss)/loss.shape[0]
return loss
Here is why the above method works - MSE Loss means mean squared error loss. So you need not have to implement square root (torch.sqrt) in your code. By default, the loss in PyTorch does an average of all examples in the batch for calculating loss. Hence the second line in the method.
To implement RMSELoss and integrate into your training, you can do it look this:
class RMSELoss(torch.nn.Module):
def __init__(self):
super(RMSELoss,self).__init__()
def forward(self,x,y):
criterion = nn.MSELoss()
loss = torch.sqrt(criterion(x, y))
return loss
And you can call this class similar to any loss function in PyTorch.
I am building an image segmentation model using keras and I want to train my model on multiple loss functions. I have seen this link but I am looking for a simpler and straight-forward solutions for this situation as my loss functions are quite complex. Can someone tell me how to build a model with single output with multiple losses in keras.
You can use multiple losses with one output using weighted loss, which is a sum of your losses multiplied by weight. Create your custom loss which will return a sum of other losses with coefficients and pass it to model.compile. There is an example here.
This is just an example from here. You could play around with it.
def custom_losses(y_true, y_pred):
alpha = 0.6
squared_difference = tf.square(y_true - y_pred)
Huber = tf.keras.losses.huber(y_true, y_pred)
return tf.reduce_mean(squared_difference, axis=-1) + (alpha*Huber)
model.compile(optimizer='adam', loss=custom_losses,metrics=['MeanSquaredError'])
I currently have a multi output model
model=Model(inputs=x1, outputs=[y1,y2])
model.compile((optimizer='sgd', loss=[cutom_loss,'mse'])
What is the y_pred and y_true values here for mse loss function? What is the y_true for mse; is it output of y2 alone or its both y1 and y2?
In my custom_loss I need to pass y_true and y_pred from both the outputs sepeartaly for calculation
def custom_loss(y1_true, y1_pred,y2_true, y2_pred):
How can I do this?
Unfortunately you cannot define a 'global' loss function.
A loss function is always computed only on one output (see the pseudo-code in the accepted answer).
In your example the custom loss will be computed on y1_true and y1_pred, while the mse will be computed on y2_true and y2_pred.
If you want a custom loss that includes both y1 and y2 outputs, I can think of two ways:
Collapse multiple outputs in one output: if y1 and y2 are similar vectors, you could concatenate them in order to have only one output. Then in your custom loss you apply some indexing/slicing in order to separate the two outputs.
Make the loss an output of your model: create a custom network graph
(using keras functional API and the backend) that computes the loss,
by taking y1_true and y2_true as an input to the network; by doing
that, your final model will have 3 outputs; y1_pred, y2_pred and the
loss. After the training you can discard the part of the model your
are not interested anymore (y_true inputs and loss output).
I remember that I had a similar problem in the past and I chose to implement option 2, but it was kind of a pain.