I wanted to use a FCN (kind of U-Net) in order to make some semantic segmentation.
I performed it using Python & Keras based on Tensorflow backend. Now I have good results, I'm trying to improve them, and I think one way to do such a thing is by improving my loss computation.
I know that in my output, the several classes are imbalanced, and using the default categorical_crossentropy function can be a problem.
My model inputs and outputs are both in the float32 format, input are channel_first and output and channel_last (permutation done at the end of the model)
In the binary case, when I only want to segment one class, I have change the loss function in this way so it can add the weights case by case depending on the content of the output :
def weighted_loss(y_true, y_pred):
def weighted_binary_cross_entropy(y_true, y_pred):
w = tf.reduce_sum(y_true)/tf_cast(tf_size(y_true), tf_float32)
real_th = 0.5-th
tf_th = tf.fill(tf.shape(y_pred), real_th)
tf_zeros = tf.fill(tf.shape(y_pred), 0.)
return (1.0 - w) * y_true * - tf.log(tf.maximum(tf.zeros, tf.sigmoid(y_pred) + tf_th)) +
(1- y_true) * w * -tf.log(1 - tf.maximum(tf_zeros, tf.sigmoid(y_pred) + tf_th))
return weighted_binary_coss_entropy
Note that th is the activation threshold which by default is 1/nClasses and which I have changed in order to see what value gives me the best results
What do you think about it?
What about change it so it will be able to compute the weighted categorical cross entropy (in the case of multi-class)
Your implementation will work for binary classes , for multi class it will just be
-y_true * tf.log(tf.sigmoid(y_pred))
and use inbuilt tensorflow method for calculating categorical entropy as it avoids overflow for y_pred<0
you can view this answer Unbalanced data and weighted cross entropy ,it explains weighted categorical cross entropy implementation.
The only change for categorical_crossentropy would be
def weighted_loss(y_true, y_pred):
def weighted_categorical_cross_entropy(y_true, y_pred):
w = tf.reduce_sum(y_true)/tf_cast(tf_size(y_true), tf_float32)
loss = w * tf.nn.softmax_cross_entropy_with_logits(onehot_labels, logits)
return loss
return weighted_categorical_cross_entropy
extracting prediction for individual class
def loss(y_true, y_pred):
s = tf.shape(y_true)
# if number of output classes is at last
number_classses = s[-1]
# this will give you one hot code for your prediction
clf_pred = tf.one_hot(tf.argmax(y_pred, axis=-1), depth=number_classses, axis=-1)
# extract the values of y_pred where y_pred is max among the classes
prediction = tf.where(tf.equal(clf_pred, 1), y_pred, tf.zeros_like(y_pred))
# if one hotcode == 1 then class1_prediction == y_pred else class1_prediction ==0
class1_prediction = prediction[:, :, :, 0:1]
# you can compute your loss here on individual class and return the loss ,just for simplicity i am returning the class1_prediction
return class1_prediction
output from model
y_pred = [[[[0.5, 0.3, 0.7],
[0.6, 0.3, 0.2]]
,
[[0.7, 0.9, 0.6],
[0.3 ,0.9, 0.3]]]]
corresponding ground truth
y_true = [[[[0, 1, 0],
[1 ,0, 0]]
,
[[1,0 , 0],
[0,1, 0]]]]
prediction for class 1
prediction = loss(y_true, y_pred)
# prediction = [[[[0. ],[0.6]],[0. ],[0. ]]]]
Related
I am trying to implement logistic regression with Tensorflow. I assume that I have the labels in the form of {-1, 1}. So, I have implemented the decision function and loss function
def cross_entropy(y_pred, y_true):
return tf.reduce_mean(tf.math.log(1 + tf.math.exp(- y_true * y_pred[:, 0] ))) + tf.nn.l2_loss(W)`
def logistic_regression(x):
return tf.matmul(x, W) + b
Is this correct? The loss is nan.
This is an option,
def logistic_regression(x):
# Apply softmax to normalize the logits to a probability distribution.
return tf.nn.softmax(tf.matmul(x, W) + b)
def cross_entropy(y_pred, y_true):
# Encode label to a one hot vector.
y_true = tf.one_hot(y_true, depth=num_classes)
# Clip prediction values to avoid log(0) error.
y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)
# Compute cross-entropy.
return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred)))
Take a look at this full implementation
https://builtin.com/data-science/guide-logistic-regression-tensorflow-20
I have an image segmentation problem I have to solve in TensorFlow 2.
In particular I have a training set composed by aerial images paired with their respective masks. In a mask the terrain is colored in black and the buildings are colored in white. The purpose is to predict the mask for the images in the test set.
I use a UNet with a final Conv2DTranspose with 1 filter and a sigmoid activation function. The prediction is made in the following way on the output of the final sigmoid layer: if y_pred>0.5, then it's a building, otherwise it's the background.
I want to implement a dice loss, so I wrote the following function
def dice_loss(y_true, y_pred):
print("[dice_loss] y_pred=",y_pred,"y_true=",y_true)
y_pred = tf.cast(y_pred > 0.5, tf.float32)
y_true = tf.cast(y_true, tf.float32)
numerator = 2 * tf.reduce_sum(y_true * y_pred)
denominator = tf.reduce_sum(y_true + y_pred)
return 1 - numerator / denominator
which I pass to TensorFlow in the following way:
loss = dice_loss
optimizer = tf.keras.optimizers.Adam(learning_rate=config.learning_rate)
metrics = [my_IoU, 'acc']
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
but at training time TensorFlow throw me the following error:
ValueError: No gradients provided for any variable:
The problem is in your loss function (obviously). Particularly, the following operation.
y_pred = tf.cast(y_pred > 0.5, tf.float32)
This is not a differentiable operation. Which results in Gradients being None. Change your loss function to the following and it will work.
def dice_loss(y_true, y_pred):
print("[dice_loss] y_pred=",y_pred,"y_true=",y_true)
y_true = tf.cast(y_true, tf.float32)
numerator = 2 * tf.reduce_sum(y_true * y_pred)
denominator = tf.reduce_sum(y_true + y_pred)
return 1 - numerator / denominator
Assume that y_true and y_pred are in [-1,1]. I want a weighted mean-square-error loss function, in which the loss for samples that are positive in the y_true and negative in y_pred or vice versa are weighted by exp(alpha). Here is my code:
import keras.backend as K
alpha = 1.0
def custom_loss(y_true, y_pred):
se = K.square(y_pred-y_true)
true_label = K.less_equal(y_true,0.0)
pred_label = K.less_equal(y_pred,0.0)
return K.mean(se * K.exp(alpha*K.cast(K.not_equal(true_label,pred_label), tf.float32)))
And here is a plot of this loss function. Different curves are for different values for y_true.
I want to know:
Whether this is a valid loss function, since it is not differentiable in 0?
Is my code correct?
I suggest you this type of loss function to handle imbalance dataset
def focal_loss(y_true, y_pred):
gamma = 2.0, alpha = 0.25
pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
return -K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow(pt_0, gamma) * K.log(1. - pt_0))
from this source
I am writing to you because I have a problem, the same as the question in the link below that has not been answered.
Previous question
I have defined a loss function that excludes all pixel = 0:
def ignore_unknown_xentropy(ytrue, ypred):
return (1-ytrue[:, :, :, 0]) * K.categorical_crossentropy(ytrue, ypred)
I noticed that during the training accuracy is distorted, going to consider the pixels equal to zero that I had previously excluded.
There is a way to define a custom metric to exclude zero pixels on y_true from its calculation?
I'm using this approach, that is to calculate the accuracy for each relevant label and add them, but it does not give me reliable results:
def single_class_accuracy(y_true, y_pred, class_id):
class_id_true = K.argmax(y_true, axis=-1)
class_id_preds = K.argmax(y_pred, axis=-1)
accuracy_mask = K.cast(K.equal(class_id_preds, class_id), 'int32')
class_acc_tensor = K.cast(K.equal(class_id_true, class_id_preds), 'int32') * accuracy_mask
class_acc = K.sum(class_acc_tensor) / K.maximum(K.sum(accuracy_mask), 1)
return class_acc
def custom_accuracy(y_true, y_pred):
return single_class_accuracy(y_true, y_pred, 1) + single_class_accuracy(y_true, y_pred, 2)
And my ytrue is a one-hot encoded mask.
Is there no way to achieve this just with class weighting?
Set 0 for the class you are not interested in?
I may have misunderstood, but this approach lept out to me (ignore class zero):
class_weights = {0.0:0.2,0.2:0.2:0.2}
history=model.fit(x=in_train,y=target_train,class_weight=class_weights,
validation_data=.... etc
Let $F \in \mathbb{R}^{S \times F}$ be a matrix of features, I want to classify them using logistic regression with autograd [1]. The code I am using is similar to the one in the following example [2].
The only thing I want to change is that I have an additional weight matrix $W$ in $\mathbb{R}^{F \times L}$ that I want to apply to each feature. So each feature is multiplied with $W$ and then feed into the logistic regression.
Is it somehow possible to train $W$ and the weights of the logistic regression simultaneously using autograd?
I have tried the following code, unfortunately the weights stay at value 0.
import autograd.numpy as np
from autograd import grad
global inputs
def sigmoid(x):
return 0.5 * (np.tanh(x) + 1)
def logistic_predictions(weights, inputs):
# Outputs probability of a label being true according to logistic model.
return sigmoid(np.dot(inputs, weights))
def training_loss(weights):
global inputs
# Training loss is the negative log-likelihood of the training labels.
feature_weights = weights[3:]
feature_weights = np.reshape(feature_weights, (3, 3))
inputs = np.dot(inputs, feature_weights)
preds = logistic_predictions(weights[0:3], inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))
# Build a toy dataset.
inputs = np.array([[0.52, 1.12, 0.77],
[0.88, -1.08, 0.15],
[0.52, 0.06, -1.30],
[0.74, -2.49, 1.39]])
targets = np.array([True, True, False, True])
# Define a function that returns gradients of training loss using autograd.
training_gradient_fun = grad(training_loss)
# Optimize weights using gradient descent.
weights = np.zeros([3 + 3 * 3])
print "Initial loss:", training_loss(weights)
for i in xrange(100):
print(i)
print(weights)
weights -= training_gradient_fun(weights) * 0.01
print "Trained loss:", training_loss(weights)
[1] https://github.com/HIPS/autograd
[2] https://github.com/HIPS/autograd/blob/master/examples/logistic_regression.py
Typical practice is to concatenate all "vectorized" parameters into the decision variables vector.
If you update logistic_predictions to include the W matrix, via something like
def logistic_predictions(weights_and_W, inputs):
'''
Here, :arg weights_and_W: is an array of the form [weights W.ravel()]
'''
# Outputs probability of a label being true according to logistic model.
weights = weights_and_W[:inputs.shape[1]]
W_raveled = weights_and_W[inputs.shape[1]:]
n_W = len(W_raveled)
W = W_raveled.reshape(inputs.shape[1], n_W/inputs.shape[1])
return sigmoid(np.dot(np.dot(inputs, W), weights))
then simply change traning_loss to (from the original source example)
def training_loss(weights_and_W):
# Training loss is the negative log-likelihood of the training labels.
preds = logistic_predictions(weights_and_W, inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))