Deal with imbalanced dataset in text classification with Keras and Theano - python

For ~20,000 text datasets, the true and false samples are ~5,000 against ~1,5000. Two-channel textCNN built with Keras and Theano is used to do the classification. F1 score is the evaluation metric. The F1 score is not bad while the confusion matrix shows that the accuracy of the true samples is relatively low(~40%). But actually it is very important to predict the true samples accurately. Therefore, want to design a custom binary cross entropy loss function to increase the weight of mis-classified true samples and make the model focus more on predicting accurately on the true samples.
tried class_weight with sklearn in model.fit method and it did not work very well since the weight applied to all samples instead of the mis-classified ones.
tried and adjusted the method mentioned here: https://github.com/keras-team/keras/issues/2115, but the loss function was categorical cross entropy and it did not work well for the binary classification problem. Tried to modified the loss function to a binary one but encounter some issues concerning the input dimension.
The sample code of the cost sensitive loss function focusing on the mis-classified samples is:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
Actually, a custom loss function for binary classification implemented with Keras and Theano that focuses on the mis-classified samples is of great importance to the imbalanced dataset. Please help troubleshoot this. Thanks!

Well when I have to deal with imbalanced datasets in keras, what I do is to first compute the weights for each class and pass them to the model instance during training. This will look something like this:
from sklearn.utils import compute_class_weight
w = compute_class_weight('balanced', np.unique(targets), targets)
# here I am adding only two categories with their corresponding weights
# you can spin a loop or continue by hand until you include all of your categories
weights = {
np.unique(targets)[0] : w[0], # class 0 with weight 0
np.unique(targets)[1] : w[1] # class 1 with weight 1
}
# then during training you do like this
model.fit(x=features, y=targets, {..}, class_weight=weights)
I believe this will solve your problem.

Related

Ensemble stacking model... with different inputs (molecular Fingerprint) in one type of keras model

I am CHEMIST and still learning ML...
I have trained 7 different models with keras using different types of molecular fingerprints as features to predict a property...however the accuracy was not that good.
So using a tutorial i found online
def optimized_weights(prd,y_fold):
# define bounds on each weight
bound_w = [(0.0, 1.0) for _ in range(n_members) ]
# arguments to the loss function
search_arg = (prd ,y_fold)
# global optimization of ensemble weights
result = differential_evolution(loss_function, bound_w,search_arg, maxiter=2000, tol=0.0001)
# get the chosen weights
weights = normalize(result['x'])
return weights
def weighted_accuracy(prd,weights,y_fold):
summed = tensordot(prd, weights, axes=((0),(0)))
yhat=np.round(summed)
score = accuracy_score(y_fold,yhat )
f1 = f1_score(y_fold,yhat)
fpr, tpr, thresholds = roc_curve(y_fold,summed,pos_label=1)
auc_test = auc(fpr, tpr)
conf_matrix=confusion_matrix(y_fold,yhat)
total=sum(sum(conf_matrix))
sensitivity = conf_matrix[0,0]/(conf_matrix[0,0]+conf_matrix[0,1])
specificity = conf_matrix[1,1]/(conf_matrix[1,0]+conf_matrix[1,1])
return score,auc_test,sensitivity,specificity,f1
For weighted average ensemble model,i trained model on 80% of data and 20% was used to find optimized weights using differential_evolution (from scipy) for max accuracy, but i think this accuracy is biased toward test data...
I also repeated the same process for 5 fold cross validation and determined avg accuracy....
Is it acceptable...
if not, then please tell me what i can do
Thanks
DeepStack offers an interface for stacking and "ensembling" Keras Models. It also offers performance tests based on validation data out of the box

For a classification model in tensorflow, is there a way to impose an asymmetric cost function during the training?

I am trying to build a Neural Network in tensorflow where the cost of a Type I error (false-positive) is more costly than a Type II error (false-negative). Is there a way to impose this during the training process (i.e. inputting a cost matrix)? This is possible with simple models like Logistic Regression in scikit learn by specifying the class_weight parameter.
cw = {0: 3,1:1}
clf = LogisticRegression(class_weight = cw )
In this case, incorrectly predicting a 0 is 3x more costly than incorrectly predicting a 1. However, this cannot be performed with a Neural Network, so I want to see if it is possible in tensorflow.
Thanks
You could use tf.nn.weighted_cross_entropy_with_logits and it's pos_weight argument.
This argument weights positive class, as described by documentation (in TF2.0 at least):
A value pos_weights > 1 decreases the false negative count, hence increasing the recall.
Conversely setting pos_weights < 1 decreases the false positive count and increases the precision.
In your case, you could create custom loss function like this:
import tensorflow as tf
# Output logits from your network, not the values after sigmoid activation
class WeightedBinaryCrossEntropy:
def __init__(self, positive_weight: float):
self.positive_weight = positive_weight
def __call__(self, targets, logits, sample_weight=None):
return tf.nn.weighted_cross_entropy_with_logits(
targets, logits, pos_weight=self.positive_weight
)
And create a custom neural network with it, for example using tf.keras (samples are weighted as they were in your question:
import numpy as np
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(32, input_shape=(10,)),
tf.keras.layers.Activation("relu"),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation("relu"),
# Output one logit for binary classification
tf.keras.layers.Dense(1),
]
)
# Example random data
data = np.random.random((32, 10))
targets = np.random.randint(2, size=32)
# 3 times as costly to make type I error
model.compile(optimizer="rmsprop", loss=WeightedBinaryCrossEntropy(positive_weight=3))
model.fit(data, targets, batch_size=32)
You can use a logarithmic scale. For a 0 incorrectly predicted as 1, y - ŷ = -1, log goes to 1.71. For a 1 predicted as 0, y - ŷ = 1 log equals 0.63. For y == ŷ log equals 0. Almost the three times more costly, for a 0 incorrectly predicted as 1.
import numpy as np
from math import exp
loss=abs(1-exp(-np.log(exp(y-ŷ))))
#abs(1-exp(-np.log(exp(0))))
#Out[53]: 0.0
#abs(1-exp(-np.log(exp(-1))))
#Out[54]: 1.718281828459045
#abs(1-exp(-np.log(exp(1))))
#Out[55]: 0.6321205588285577
Then you will have a convex optimization. Implementing:
import keras.backend as K
def custom_loss(y_true,y_pred):
return K.mean(abs(1-exp(-np.log(exp(y_true-y_pred)))))
Then:
model.compile(loss=custom_loss, optimizer=sgd,metrics = ['accuracy'])

Adjust custom loss function for gradient boosting classification

I have implemented a gradient boosting decision tree to do a mulitclass classification. My custom loss functions look like this:
import numpy as np
from sklearn.preprocessing import OneHotEncoder
def softmax(mat):
res = np.exp(mat)
res = np.multiply(res, 1/np.sum(res, axis=1, keepdims=True))
return res
def custom_asymmetric_objective(y_true, y_pred_encoded):
pred = y_pred_encoded.reshape((-1, 3), order='F')
pred = softmax(pred)
y_true = OneHotEncoder(sparse=False,categories='auto').fit_transform(y_true.reshape(-1, 1))
grad = (pred - y_true).astype("float")
hess = 2.0 * pred * (1.0-pred)
return grad.flatten('F'), hess.flatten('F')
def custom_asymmetric_valid(y_true, y_pred_encoded):
y_true = OneHotEncoder(sparse=False,categories='auto').fit_transform(y_true.reshape(-1, 1)).flatten('F')
margin = (y_true - y_pred_encoded).astype("float")
loss = margin*10
return "custom_asymmetric_eval", np.mean(loss), False
Everything works, but now I want to adjust my loss function in the following way: It should "penalize" if an item is classified incorrectly, and a penalty should be added for a certain constraint (this is calculated before, let's just say the penalty is e.g. 0,05, so just a real number).
Is there any way to consider both, the misclassification and the penalty value?
Try L2 regularization: weights will be updated following the subtraction of a learning rate times error times x plus the penalty term lambda weight to the power of 2
Simplifying:
This will be the effect:
ADDED: The penalization term (on the right of equation) increases the generalization power of your model. So, if you overfit your model in training set, the perfomance will be poor in test set. So, you penalize these "right" classifications in training set that generate error in test set and compromise generalization.

How to use F-score as error function to train neural networks?

I am pretty new to neural networks. I am training a network in tensorflow, but the number of positive examples is much much less than negative examples in my dataset (it is a medical dataset).
So, I know that F-score calculated from precision and recall is a good measure of how well the model is trained.
I have used error functions like cross-entropy loss or MSE before, but they are all based on accuracy calculation (if I am not wrong). But how do I use this F-score as an error function? Is there a tensorflow function for that? Or I have to create a new one?
Thanks in advance.
It appears approaches for optimising directly for these types of metrics have been devised and used successfully, improving scoring and or training times:
https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77289
https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/70328
https://www.kaggle.com/rejpalcz/best-loss-function-for-f1-score-metric
One such method involves using the sums of probabilities, in place of counts, for the sets of true positives, false positives, and false negative metrics. For example F-beta loss (the generalisation of F1) can be calculated in with Torch in Python as follows:
def forward(self, y_logits, y_true):
y_pred = self.sigmoid(y_logits)
TP = (y_pred * y_true).sum(dim=1)
FP = ((1 - y_pred) * y_true).sum(dim=1)
FN = (y_pred * (1 - y_true)).sum(dim=1)
fbeta = (1 + self.beta**2) * TP / ((1 + self.beta**2) * TP + (self.beta**2) * FN + FP + self.epsilon)
fbeta = fbeta.clamp(min=self.epsilon, max=1 - self.epsilon)
return 1 - fbeta.mean()
An alternative method is described in this paper:
https://arxiv.org/abs/1608.04802
The approach taken optimises for a lower bound on the statistic. Other metrics such as AUROC and AUCPR are also discussed. An implementation in TF of such an approach can be found here:
https://github.com/tensorflow/models/tree/master/research/global_objectives
I think you are confusing model evaluation metrics for classification with training losses.
Accuracy, precision, F-scores etc. are evaluation metrics computed from binary outcomes and binary predictions.
For model training, you need a function that compares a continuous score (your model output) with a binary outcome - like cross-entropy. Ideally, this is calibrated such that it is minimised if the predicted mean matches the population mean (given covariates). These rules are called proper scoring rules, and the cross-entropy is one of them.
Also check the thread is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting
If you want to weigh positive and negative cases differently, two methods are
oversample the minority class and correct predicted probabilities when predicting on new examples. For fancier methods, check the under sampling module of imbalanced-learn to get an overview.
use a different proper scoring rule for training loss. This allows to e.g. build in asymmetry in how you treat positive and negative cases while preserving calibration. Here is review of the subject.
I recommend just using simple oversampling in practice.
the loss value and accuracy is a different concept. The loss value is used for training the NN. However, accuracy or other metrics is to value the training result.

Kaggle airbus ship detection challenge.How to deal with class imbalance?

My model always predict under probability 0.5 for all pixels.
I dropped all images without ships and have tried focal loss,iou loss,weighted loss to deal with imbalance .
But the result is same.After few batches the masks i predicted gradually became all zeros.
Here is my notebook: enter link description here
Kaggle discussion:enter link description here
In the notebook , basically what i did is :
(1)discard all samples where there is no ship
(2)build a plain u-net
(3)define three custom loss function(iouloss,focal_binarycrossentropy,biased_crossentropy), all of which i have tried.
(4)train and submit
#define different losses to try
def iouloss(y_true,y_pred):
intersection = K.sum(y_true * y_pred, axis=-1)
sum_ = K.sum(y_true + y_pred, axis=-1)
jac = intersection / (sum_ - intersection)
return 1 - jac
def focal_binarycrossentropy(y_true,y_pred):
#focal loss with gamma 8
t1=K.binary_crossentropy(y_true, y_pred)
t2=tf.where(tf.equal(y_true,0),t1*(y_pred**8),t1*((1-y_pred)**8))
return t2
def biased_crossentropy(y_true,y_pred):
#apply 1000 times heavier punishment to ship pixels
t1=K.binary_crossentropy(y_true, y_pred)
t2=tf.where(tf.equal(y_true,0),t1*1000,t1)
return t2
...
#try different loss function
unet.compile(loss=iouloss, optimizer="adam", metrics=[ioumetric])
or
unet.compile(loss=focal_binarycrossentropy, optimizer="adam", metrics=[ioumetric])
or
unet.compile(loss=biased_crossentropy, optimizer="adam", metrics=[ioumetric])
...
#start training
unet.train_on_batch(x=image_batch,y=mask_batch)
One option that Keras provides is class_weight parameter in fit from documentation:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
This will allow you to counter the imbalance to some extent.
I have heard use of the Dice coefficient for this problem, although I have no personal experience of having done so. Perhaps you could try this? It is related to the Jaccard but have heard anecdotally that it is easier to train. Sorry not to offer anything more concrete.

Categories

Resources