This may be more of a Tensorflow gradient question. I have been attempting to implement Intersection over Union (IoU) as losses and have been running into some problems. To the point, here is the snippet of my code that computes the IoU:
def get_iou(masks, predictions):
ious = []
for i in range(batch_size):
mask = masks[i]
pred = predictions[i]
masks_sum = tf.reduce_sum(mask)
predictions_sum = tf.reduce_mean(pred)
intersection = tf.reduce_sum(tf.multiply(mask, pred))
union = masks_sum + predictions_sum - intersection
iou = intersection / union
ious.append(iou)
return ious
iou = get_iou(masks, predictions)
mean_iou_loss = -tf.log(tf.reduce_sum(iou))
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
It works as predicted. However, the issue that I am having is the losses do not decrease. The model does train, though the results are less than ideal so I am wondering if I am implementing it correctly. Do I have to compute the gradients myself? I can compute the gradients for this IoU loss derived by this paper using tf.gradients(), though I am not sure how to incorporate that with the tf.train.AdamOptimizer(). Reading the documentation, I feel like compute_gradients and apply_gradients are the commands that I need to use, but I can't find any examples on how to use them. My understanding is that the Tensorflow graph should be able to come up with the gradient itself via chain rule. So is a custom gradient even necessary in this problem? If the custom gradient is not necessary then I may just have an ill-posed problem and need to adjust some hyperparameters.
Note: I have tried Tensorflow's implementation of the IoU, tf.metrics.mean_iou(), but it spits out inf every time so I have abandoned that.
Gradient computation occurs inside optimizer.minimize function, so, no explicit use inside loss function is needed. However, your implementation simply lacks an optimizable, trainable variable.
iou = get_iou(masks, predictions)
mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
Numerical stability, differentiability and particular implementation aside, this should be enough to use it as a loss function, which will change with iterations.
Also take a look:
https://arxiv.org/pdf/1902.09630.pdf
Why does one not use IOU for training?
Related
I want to calculate L1 loss in a neural network, I came across this example at https://discuss.pytorch.org/t/simple-l2-regularization/139/2, but there are some errors in this code.
Is this really how to calculate L1 Loss in a NN or is there a simpler way?
l1_crit = nn.L1Loss()
reg_loss = 0
for param in model.parameters():
reg_loss += l1_crit(param)
factor = 0.0005
loss += factor * reg_loss
Is this equivalent in any way to simple doing:
loss = torch.nn.L1Loss()
I assume not, because I am not passing along any network parameters. Just checking if there isn existing function to do this.
If I am understanding well, you want to compute the L1 loss of your model (as you say in the begining). However I think you might got confused with the discussion in the pytorch forum.
From what I understand, in the Pytorch forums, and the code you posted, the author is trying to normalize the network weights with L1 regularization. So it is trying to enforce that weights values fall in a sensible range (not too big, not too small). That is weights normalization using L1 normalization (that is why it is using model.parameters()). Normalization takes a value as input and produces a normalized value as output.
Check this for weights normalization: https://pytorch.org/docs/master/generated/torch.nn.utils.weight_norm.html
On the other hand, L1 Loss it is just a way to determine how 2 values differ from each other, so the "loss" is just measure of this difference. In the case of L1 Loss this error is computed with the Mean Absolute Error loss = |x-y| where x and y are the values to compare. So error compute takes 2 values as input and produces a value as output.
Check this for loss computing: https://pytorch.org/docs/master/generated/torch.nn.L1Loss.html
To answer your question: no, the above snippets are not equivalent, since the first is trying to do weights normalization and the second one, you are trying to compute a loss. This would be the loss computing with some context:
sample, target = dataset[i]
target_predicted = model(sample)
loss = torch.nn.L1Loss()
loss_value = loss(target, target_predicted)
I want to train my neural network (in Keras) with an additional condition on the output elements.
An example:
Minimize my loss function MSE between network output y_pred and y_true.
Additionally, ensure that the norm of y_pred is less or equal 1.
Without the condition, the task is straightforward.
Note: The condition is not necessarily the vector norm of y_pred.
How can I implement the additional condition/restriction in a Keras (or maybe Tensorflow) model?
In principle, tensorflow (and keras) don't allow you to add hard constraints to your model.
You have to convert your invarient (norm <= 1) to a penalty function, which is added to the loss. This could look like this:
y_norm = tf.norm(y_pred)
norm_loss = tf.where(y_norm > 1, y_norm, 0)
total_loss = mse + norm_loss
Look at the docs of where. If your prediction has a norm bigger than one, backpropagation tries to minimize the norm. If it is less than or equal, this part of the loss is simply 0. No gradient is produced.
But this can be very hard to optimize. Your predictions could oscillate around a norm of 1. It is also possible to add a factor: total_loss = mse + 1000* norm_loss. Be very careful with this, it makes optimization even harder.
In the example above, the norm above one contributes linearly to the loss. This is called l1-regularization. You could also square it, which would become l2-regularization.
In your specific case, you could get creative. Why not normalize your predictions and the targets to one (just a suggestion, might be a bad idea)?
loss = mse(y_pred / tf.norm(y_pred), y_target / np.linalg.norm(y_target)
I am pretty new to neural networks. I am training a network in tensorflow, but the number of positive examples is much much less than negative examples in my dataset (it is a medical dataset).
So, I know that F-score calculated from precision and recall is a good measure of how well the model is trained.
I have used error functions like cross-entropy loss or MSE before, but they are all based on accuracy calculation (if I am not wrong). But how do I use this F-score as an error function? Is there a tensorflow function for that? Or I have to create a new one?
Thanks in advance.
It appears approaches for optimising directly for these types of metrics have been devised and used successfully, improving scoring and or training times:
https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77289
https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/70328
https://www.kaggle.com/rejpalcz/best-loss-function-for-f1-score-metric
One such method involves using the sums of probabilities, in place of counts, for the sets of true positives, false positives, and false negative metrics. For example F-beta loss (the generalisation of F1) can be calculated in with Torch in Python as follows:
def forward(self, y_logits, y_true):
y_pred = self.sigmoid(y_logits)
TP = (y_pred * y_true).sum(dim=1)
FP = ((1 - y_pred) * y_true).sum(dim=1)
FN = (y_pred * (1 - y_true)).sum(dim=1)
fbeta = (1 + self.beta**2) * TP / ((1 + self.beta**2) * TP + (self.beta**2) * FN + FP + self.epsilon)
fbeta = fbeta.clamp(min=self.epsilon, max=1 - self.epsilon)
return 1 - fbeta.mean()
An alternative method is described in this paper:
https://arxiv.org/abs/1608.04802
The approach taken optimises for a lower bound on the statistic. Other metrics such as AUROC and AUCPR are also discussed. An implementation in TF of such an approach can be found here:
https://github.com/tensorflow/models/tree/master/research/global_objectives
I think you are confusing model evaluation metrics for classification with training losses.
Accuracy, precision, F-scores etc. are evaluation metrics computed from binary outcomes and binary predictions.
For model training, you need a function that compares a continuous score (your model output) with a binary outcome - like cross-entropy. Ideally, this is calibrated such that it is minimised if the predicted mean matches the population mean (given covariates). These rules are called proper scoring rules, and the cross-entropy is one of them.
Also check the thread is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting
If you want to weigh positive and negative cases differently, two methods are
oversample the minority class and correct predicted probabilities when predicting on new examples. For fancier methods, check the under sampling module of imbalanced-learn to get an overview.
use a different proper scoring rule for training loss. This allows to e.g. build in asymmetry in how you treat positive and negative cases while preserving calibration. Here is review of the subject.
I recommend just using simple oversampling in practice.
the loss value and accuracy is a different concept. The loss value is used for training the NN. However, accuracy or other metrics is to value the training result.
My model always predict under probability 0.5 for all pixels.
I dropped all images without ships and have tried focal loss,iou loss,weighted loss to deal with imbalance .
But the result is same.After few batches the masks i predicted gradually became all zeros.
Here is my notebook: enter link description here
Kaggle discussion:enter link description here
In the notebook , basically what i did is :
(1)discard all samples where there is no ship
(2)build a plain u-net
(3)define three custom loss function(iouloss,focal_binarycrossentropy,biased_crossentropy), all of which i have tried.
(4)train and submit
#define different losses to try
def iouloss(y_true,y_pred):
intersection = K.sum(y_true * y_pred, axis=-1)
sum_ = K.sum(y_true + y_pred, axis=-1)
jac = intersection / (sum_ - intersection)
return 1 - jac
def focal_binarycrossentropy(y_true,y_pred):
#focal loss with gamma 8
t1=K.binary_crossentropy(y_true, y_pred)
t2=tf.where(tf.equal(y_true,0),t1*(y_pred**8),t1*((1-y_pred)**8))
return t2
def biased_crossentropy(y_true,y_pred):
#apply 1000 times heavier punishment to ship pixels
t1=K.binary_crossentropy(y_true, y_pred)
t2=tf.where(tf.equal(y_true,0),t1*1000,t1)
return t2
...
#try different loss function
unet.compile(loss=iouloss, optimizer="adam", metrics=[ioumetric])
or
unet.compile(loss=focal_binarycrossentropy, optimizer="adam", metrics=[ioumetric])
or
unet.compile(loss=biased_crossentropy, optimizer="adam", metrics=[ioumetric])
...
#start training
unet.train_on_batch(x=image_batch,y=mask_batch)
One option that Keras provides is class_weight parameter in fit from documentation:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
This will allow you to counter the imbalance to some extent.
I have heard use of the Dice coefficient for this problem, although I have no personal experience of having done so. Perhaps you could try this? It is related to the Jaccard but have heard anecdotally that it is easier to train. Sorry not to offer anything more concrete.
Given a TensorFlow tf.while_loop, how can I calculate the gradient of x_out with respect to all weights of the network for each time step?
network_input = tf.placeholder(tf.float32, [None])
steps = tf.constant(0.0)
weight_0 = tf.Variable(1.0)
layer_1 = network_input * weight_0
def condition(steps, x):
return steps <= 5
def loop(steps, x_in):
weight_1 = tf.Variable(1.0)
x_out = x_in * weight_1
steps += 1
return [steps, x_out]
_, x_final = tf.while_loop(
condition,
loop,
[steps, layer_1]
)
Some notes
In my network the condition is dynamic. Different runs are going to run the while loop a different amount of times.
Calling tf.gradients(x, tf.trainable_variables()) crashes with AttributeError: 'WhileContext' object has no attribute 'pred'. It seems like the only possibility to use tf.gradients within the loop is to calculate the gradient with respect to weight_1 and the current value of x_in / time step only without backpropagating through time.
In each time step, the network is going to output a probability distribution over actions. The gradients are then needed for a policy gradient implementation.
You can't ever call tf.gradients inside tf.while_loop in Tensorflow based on this and this, I found this out the hard way when I was trying to create conjugate gradient descent entirely into the Tensorflow graph.
But if I understand your model correctly, you could make your own version of an RNNCell and wrap it in a tf.dynamic_rnn, but the actual cell
implementation will be a little complex since you need to evaluate a condition dynamically at runtime.
For starters, you can take a look at Tensorflow's dynamic_rnn code here.
Alternatively, dynamic graphs have never been Tensorflow's strong suite, so consider using other frameworks like PyTorch or you can try out eager_execution and see if that helps.