If I look into keras metric I see that the values of y_true and y_predict are "just" compared at the end of each epoch for categorical_accuracy:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
How are masked values handled? If I understood correctly, masking prohibits the masked values to influence the training, but it still produces predictions for the masked values. Thereby, it does, in my opinion, influence the metric.
More explanation on how it influences the metric:
In the padding/masking process, I set the padded/masked values in y_true to an unused class e.g. class 0.
If now argmax() is looking for a max value in the one-hot encoded y_true, it will just return 0 as the total (masked) row is the same.
I do not have a class 0, as it is my masking value/class, and thereby the y_pred and y_true certainly have different values creating a reduced accuracy.
Is this somehow already thought of in the Keras metric and I oversaw it?
Otherwise, I would have to create a custom metric or callback creating a similar metric to categorical_accuracy with the addition that all masked values are eliminated in y_pred and y_true before comparison.
Maybe the best answer would be this from Keras.metrics :
A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.
The training is only influenced by the loss function where masking is implemented.
Nevertheless, your displayed results are not on par with the actual results and can lead to misleading conclusions.
As the metric is not used in the training process, a callback function can solve this.
something like this (based on Andrew Ng). I Search for 0 here as for my masked target all one-hot-encoded targets are 0 (No class activated).
import numpy as np
from keras.callbacks import Callback
from sklearn.metrics import accuracy_score
class categorical_accuracy_no_mask(Callback):
def on_train_begin(self, logs={}):
self.val_acc = []
def on_epoch_end(self, epoch, logs={}):
val_predict = (np.asarray(self.model.predict(self.model.validation_data[0]))).round()
val_targ = self.model.validation_data[1]
indx = np.where(~val_targ.any(axis=2))[0] #find where all targets are zero. That are the masked once as we masked the target with 0 and the data with 666
y_true_nomask = numpy.delete(val_targe, indx, axis=0)
y_pred_nomask = numpy.delete(val_predict, indx, axis=0)
_val_accuracy = accuracy_score(y_true_nomask, y_pred_nomask)
self.val_acc.append(_val_accuracy)
print “ — val_accuracy : %f ” %( _val_accuracy )
return
Of course, now you could add also precision-recall etc.
Related
I am dealing with a semantic segmentation problem where the two classes in which I am interested (in addition to background) are quiet unbalanced in the image pixels. I am actually using sparse categorical cross entropy as a loss, due to the way in which training masks are encoded. Is there any version of it which takes into account class weights? I have not been able to find it, and not even the original source code of sparse_categorical_cross_entropy. I never explored the tf source code before, but the link to source code from API page doesn't seem to link to a real implementation of the loss function.
As far as I know you can use class weights in model.fit for any loss function. I have used it with categorical_cross_entropy and it works. It just weights the loss with the class weight so I see no reason it should not work with sparse_categorical_cross_entropy.
I think this is the solution to weigh sparse_categorical_crossentropy in Keras.
They use the following to add a "second mask" (containing the weights for each class of the mask image) to the dataset.
def add_sample_weights(image, label):
# The weights for each class, with the constraint that:
# sum(class_weights) == 1.0
class_weights = tf.constant([2.0, 2.0, 1.0])
class_weights = class_weights/tf.reduce_sum(class_weights)
# Create an image of `sample_weights` by using the label at each pixel as an
# index into the `class weights` .
sample_weights = tf.gather(class_weights, indices=tf.cast(label, tf.int32))
return image, label, sample_weights
train_dataset.map(add_sample_weights).element_spec
Then they just use tf.keras.losses.SparseCategoricalCrossentropy as loss function and fit like:
weighted_model.fit(
train_dataset.map(add_sample_weights),
epochs=1,
steps_per_epoch=10)
It seems that Keras Sparse Categorical Crossentropy doesn't work with class weights. I have found this implementation of sparse categorical cross-entropy loss for Keras, which is working to me. The implementation in the link had a little bug, which may be due to some version incompatibility, so I've fixed it.
import tensorflow as tf
from tensorflow import keras
class WeightedSCCE(keras.losses.Loss):
def __init__(self, class_weight, from_logits=False, name='weighted_scce'):
if class_weight is None or all(v == 1. for v in class_weight):
self.class_weight = None
else:
self.class_weight = tf.convert_to_tensor(class_weight,
dtype=tf.float32)
self.name = name
self.reduction = keras.losses.Reduction.NONE
self.unreduced_scce = keras.losses.SparseCategoricalCrossentropy(
from_logits=from_logits, name=name,
reduction=self.reduction)
def __call__(self, y_true, y_pred, sample_weight=None):
loss = self.unreduced_scce(y_true, y_pred, sample_weight)
if self.class_weight is not None:
weight_mask = tf.gather(self.class_weight, y_true)
loss = tf.math.multiply(loss, weight_mask)
return loss
The loss should be called by taking as an argument the list or array of weights.
I want to setup a keras model (tensorflow backend) for a multiclassification problem with 4 different classes. I have both labeled and unlabeled data.
I have worked out the case in which I only train with the labeled data and my model looks something like this:
# create model
inputs = keras.Input(shape=(len(config.variables), ))
X = layers.Dense(units=200, activation="relu")(inputs)
output = layers.Dense(units=4, activation="softmax", name="output")(X)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer=optimizers.Adam(1e-4), loss=loss_function, metrics=["accuracy"])
# train model
model.fit(
x=train_data,
y=train_class_labels,
batch_size=200,
epochs=200,
verbose=2,
validation_split=0.2,
sample_weight = class_weights
)
I have functioning models with to different losses namely categorical_crossentropy and sparse_categorical_crossentropy, and depending on the loss function my train_class_labels where in one-hot representation (e.g. [ [0,1,0,0], [0,0,0,1], ...]) or in the integer representation (e.g. [0,0,2,1,0,3, ...]) and everything worked fine. class_weights is some weight vector ([0.78, 1,34, ...])
Now for my further plans I need to include the unlabeled data in the training process but I need it to be ignored by the loss function.
What I have tried:
setting the labels from the unlabeled data to [0,0,0,0] when using categorical_crossentropy as a loss, because i thought then my unlabeled data would be ignored by the loss function. Somehow this changed the predictions after training.
I also tried setting the weights from the unlabeled data to 0 but that did have an effect either
I concluded that I need to somehow mark me unlabeled data and customize my loss function so that it can be told to ignore those samples. Something like
def custom_loss(y_true, y_pred):
if y_true == labeled data:
return normal loss function
if y_true == unlabeled data:
return 0
Those are some snippets that I have found but they do not seem to work:
def custom_loss(y_true, y_pred):
loss = losses.sparse_categorical_crossentropy(y_true, y_pred)
return K.switch(K.flatten(K.equal(y_true, -1)), K.zeros_like(loss), loss)
def custom_loss2(y_true, y_pred):
idx = tf.not_equal(y_true, -1)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
return losses.sparse_categorical_crossentropy(y_true, y_pred)
In those examples I set the labels from the unlabeled data to -1 so train_class_labels would look something like this: [0,-1,2,0,3, ... ]
But when using the first loss function I just get Nans and when using the second one I get the following error:
Invalid argument: logits and labels must have the same first dimension, got logits shape [1,5000] and labels shape [5000]
I think that setting the labels to [0,0,0,0] would be just fine. Because the loss is calculated by sum of the log losses of your instances per class (in your case the loss would be 0 for instances with no label).
I don't understand why you are inserting non labeled data in your training in a supervised setting.
I think that the differences that you obtain are due to the batch size and to the gradient step. If there are instances that do not contribute to the gradient descent, the loss calculated would be different than before, and then you get the difference in prediction.
Basically there would be less informative instances per batch.
If you use as batch size the size of all the dataset there would be no difference from a previous training without the unlabeled instances (but always with a training with batch size = size of the dataset)
I am trying to build a Neural Network in tensorflow where the cost of a Type I error (false-positive) is more costly than a Type II error (false-negative). Is there a way to impose this during the training process (i.e. inputting a cost matrix)? This is possible with simple models like Logistic Regression in scikit learn by specifying the class_weight parameter.
cw = {0: 3,1:1}
clf = LogisticRegression(class_weight = cw )
In this case, incorrectly predicting a 0 is 3x more costly than incorrectly predicting a 1. However, this cannot be performed with a Neural Network, so I want to see if it is possible in tensorflow.
Thanks
You could use tf.nn.weighted_cross_entropy_with_logits and it's pos_weight argument.
This argument weights positive class, as described by documentation (in TF2.0 at least):
A value pos_weights > 1 decreases the false negative count, hence increasing the recall.
Conversely setting pos_weights < 1 decreases the false positive count and increases the precision.
In your case, you could create custom loss function like this:
import tensorflow as tf
# Output logits from your network, not the values after sigmoid activation
class WeightedBinaryCrossEntropy:
def __init__(self, positive_weight: float):
self.positive_weight = positive_weight
def __call__(self, targets, logits, sample_weight=None):
return tf.nn.weighted_cross_entropy_with_logits(
targets, logits, pos_weight=self.positive_weight
)
And create a custom neural network with it, for example using tf.keras (samples are weighted as they were in your question:
import numpy as np
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(32, input_shape=(10,)),
tf.keras.layers.Activation("relu"),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation("relu"),
# Output one logit for binary classification
tf.keras.layers.Dense(1),
]
)
# Example random data
data = np.random.random((32, 10))
targets = np.random.randint(2, size=32)
# 3 times as costly to make type I error
model.compile(optimizer="rmsprop", loss=WeightedBinaryCrossEntropy(positive_weight=3))
model.fit(data, targets, batch_size=32)
You can use a logarithmic scale. For a 0 incorrectly predicted as 1, y - ŷ = -1, log goes to 1.71. For a 1 predicted as 0, y - ŷ = 1 log equals 0.63. For y == ŷ log equals 0. Almost the three times more costly, for a 0 incorrectly predicted as 1.
import numpy as np
from math import exp
loss=abs(1-exp(-np.log(exp(y-ŷ))))
#abs(1-exp(-np.log(exp(0))))
#Out[53]: 0.0
#abs(1-exp(-np.log(exp(-1))))
#Out[54]: 1.718281828459045
#abs(1-exp(-np.log(exp(1))))
#Out[55]: 0.6321205588285577
Then you will have a convex optimization. Implementing:
import keras.backend as K
def custom_loss(y_true,y_pred):
return K.mean(abs(1-exp(-np.log(exp(y_true-y_pred)))))
Then:
model.compile(loss=custom_loss, optimizer=sgd,metrics = ['accuracy'])
I am new to tensorflow and have problems defining a custom loss function for a customer churn problem, which includes a list of values as penalty.
So far, I replicated a mean squared error function that penalizes wrong predictions with an integer.
def rfm_penalty(y_true, y_pred):
penalty = # Integer value, to be replaced by list
loss = tf.where(tf.less(y_true * y_pred, 0),
penalty * tf.square(y_true - y_pred), # penalize negat. (wrong) preds
tf.square(y_true - y_pred)) # no penalty for pos. preds
return tf.reduce_mean(loss, axis=-1)
This one works, but I'd like to modify it: Previously, I calculated a metric that measures the value of a customer, called RFM (for recency, frequency and monetary value of past purchases). This metric is an integer value from 3 to 12 that sums up the three metrics R, F and M. It is stored in a feature column of my df_train.
df_train['RFM_Score'] = [3,6,5,9,11,12,4,...,4] # same dimensions as y_true
I would like to use this feature column (or list) as penalty, thus penalizing wrong predictions higher for highly valuable customers. I would be happy for every idea to do that, even better with a sigmoid function, as its a binary classification case.
Thank you!
In the docs, the predict_proba(self, x, batch_size=32, verbose=1) is
Generates class probability predictions for the input samples batch by batch.
and returns
A Numpy array of probability predictions.
Suppose my model is binary classification model, does the output is [a, b], for a is probability of class_0, and b is the probability of class_1?
Here the situation is different and somehow misleading, especially when you are comparing predict_proba method to sklearn methods with the same name. In Keras (not sklearn wrappers) a method predict_proba is exactly the same as a predict method. You can even check it here:
def predict_proba(self, x, batch_size=32, verbose=1):
"""Generates class probability predictions for the input samples
batch by batch.
# Arguments
x: input data, as a Numpy array or list of Numpy arrays
(if the model has multiple inputs).
batch_size: integer.
verbose: verbosity mode, 0 or 1.
# Returns
A Numpy array of probability predictions.
"""
preds = self.predict(x, batch_size, verbose)
if preds.min() < 0. or preds.max() > 1.:
warnings.warn('Network returning invalid probability values. '
'The last layer might not normalize predictions '
'into probabilities '
'(like softmax or sigmoid would).')
return preds
So - in a binary classification case - the output which you get depends on the design of your network:
if the final output of your network is obtained by a single sigmoid output - then the output of predict_proba is simply a probability assigned to class 1.
if the final output of your network is obtained by a two dimensional output to which you are applying a softmax function - then the output of predict_proba is a pair where [a, b] where a = P(class(x) = 0) and b = P(class(x) = 1).
This second method is rarely used and there are some theorethical advantages of using the first method - but I wanted to inform you - just in case.
It depends on how you specify output of your model and your targets. It can be both. Usually when one is doing binary classification the output is a single value which is a probability of the positive prediction. One minus the output is probability of the negative prediction.