This is a continuation of a question:
https://datascience.stackexchange.com/questions/22814/class-weighting-during-validation-in-keras
class_weight can be used in Keras fit function to tell the optimizer to weight the under represented class. According to the answer in stackexchange it is also considered during validation. For example, if my class ratio is 10 negatives for every 1 positive then an accuracy score of 0.8 is not so good (a fixed classifier on negatives will do better). I have two questions:
How exactly class_weight is being considered during validation?
How can I use class_weight in fit_generator? is it the same parameter as in fit?
For your first question, it is considered the same way as during training.
Basically if you look at the function weighted_masked_objective, individual samples are multiplied with the weights, and the mean is returned back.(Note: Keras does not automatically set the class weights, you need to pass the weights at model.fit() or model.fit_generator()
Class weights can be computed by the inverse proportion to frequency. Using sklearn
fit_generator is identical to fit, except for the fact it takes a generator as input.
Related
I am trying to train a Deep Neural Network (DNN) with labeled data. The labels are encoded in such a way that it only contains values 0 and 1. The shape of the encoded label is 5 x 5 x 232. About 95% of values in the label is 0and rests are 1. Currently, I am using binary_crossentroy loss function to train the network.
What is the best technique to train the DNN in such a scenario? Is the choice of binary_crossentroy
as the loss function is appropriate in this case? Any suggestion to improve the performance of the model.
You can try MSE loss. If you want to stick to binary cross-entropy (used in binary classification), consider using label smoothing.
You may use 2 other alternative loss functions instead of Binary cross-entropy.They are
Hinge Loss
An alternative to cross-entropy for binary classification problems is the hinge loss function, primarily developed for use with Support Vector Machine (SVM) models.
It is intended for use with binary classification where the target values are in the set {-1, 1}.
Squared Hinge Loss
For more Detail on loss function with examples.click here
Hope helpful, happy learning.
binary_crossentroy as loss is fine
Don't use accuracy as your metrics, because model will just predict every thing as label 0 and will still get 95% accuracy. Instead use F1 score (or precision or recall)
Use Weighted loss: I.e penalize class 1 heavily if they are wrong as compared to class 0.
Instead of class weights you can also use methods like oversampling form the minority class. (Techniques like SMOTE)
How to calculate class weight
You can use sklearn.utils.class_weight to calculate weight from your labels. Check this answer
In such scenarios where you have highly imbalanced data, I would suggest going with Random Forest with up-Sampling. This approach will up-sample the minority class and hence improve the model accuracy.
I have a multi-class classification problem and I want to measure AUC on training and test data.
tf.keras has implemented AUC metric (tf.keras.metrics.AUC), but I'm not be able to see whether this metric could safely be used in multi-class problems. Even, the example "Classification on imbalanced data" on the official Web page is dedicated to a binary classification problem.
I have implemented a CNN model that predicts six classes, having a softmax layer that gives the probabilities of all the classes. I used this metric as follows
self.model.compile(loss='categorical_crossentropy',
optimizer=Adam(hp.get("learning_rate")),
metrics=['accuracy', AUC()]),
and the code was executed without any problem. However, sometimes I see some results that are quite strange for me. For example, the model reported an accuracy of 0.78333336 and AUC equal to 0.97327775, Is this possible? Can a model have a low accuracy and an AUC so high?
I wonder that, although the code does not give any error, the AUC metric is computing wrong.
Somebody may confirm me whether or not this metrics support multi-class classification problems?
There is the argument multi_label which is a boolean inside your tf.keras.metrics.AUC call.
If True (not the default), multi-label data will be treated as such, and so AUC is computed separately for each label and then averaged across labels.
When False (the default), the data will be flattened into a single label before AUC computation. In the latter case, when multi-label data is passed to AUC, each label-prediction pair is treated as an individual data point.
The documentation recommends to set it to False for multi-class data.
e.g.: tf.keras.metrics.AUC(multi_label = True)
See the AUC Documentation for more details.
AUC can have a higher score than accuracy.
Additionally, you can use AUC to decide the cutoff threshold for a binary classifier(this cutoff is by default 0.5). Though there are more technical ways to decide this cutoff, you could simply simply increase it from 0 to 1 to find the value which maximizes your accuracy(this is a naive solution and 1 recommend you to read this https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One_ROC_Curve_and_Cutoff_Analysis.pdf for an in depth explanation on cutoff analysis )
I'm trying to understand the difference between RidgeClassifier and LogisticRegression in sklearn.linear_model. I couldn't find it in the documentation.
I think I understand quite well what the LogisticRegression does.It computes the coefficients and intercept to minimise half of sum of squares of the coefficients + C times the binary cross-entropy loss, where C is the regularisation parameter. I checked against a naive implementation from scratch, and results coincide.
Results of RidgeClassifier differ and I couldn't figure out, how the coefficients and intercept are computed there? Looking at the Github code, I'm not experienced enough to untangle it.
The reason why I'm asking is that I like the RidgeClassifier results -- it generalises a bit better to my problem. But before I use it, I would like to at least have an idea where does it come from.
Thanks for possible help.
RidgeClassifier() works differently compared to LogisticRegression() with l2 penalty. The loss function for RidgeClassifier() is not cross entropy.
RidgeClassifier() uses Ridge() regression model in the following way to create a classifier:
Let us consider binary classification for simplicity.
Convert target variable into +1 or -1 based on the class in which it belongs to.
Build a Ridge() model (which is a regression model) to predict our target variable. The loss function is MSE + l2 penalty
If the Ridge() regression's prediction value (calculated based on decision_function() function) is greater than 0, then predict as positive class else negative class.
For multi-class classification:
Use LabelBinarizer() to create a multi-output regression scenario, and then train independent Ridge() regression models, one for each class (One-Vs-Rest modelling).
Get prediction from each class's Ridge() regression model (a real number for each class) and then use argmax to predict the class.
Keras uses a class_weight parameter to deal with imbalanced datasets.
Here is what we can find in the doc:
Optional dictionary mapping class indices (integers) to a weight (float) to apply to the model's loss for the samples from this class during training. This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
Does that mean that the class_weight gives a different weights in the training error function to each class? Does it have an influence elsewhere? Is it really efficient against generalization errors, in comparison of "physically" drop instances from the most represented class?
The class_weight parameter weights the loss associated with each training example proportionate to that class's underrepresentation in the training set. This prevents class imbalance during training and should render your network robust to generalization error.
I'd exercise caution when physically dropping data instances corresponding to the most represented class, however - if your network is deep and therefore has significant representational capacity, culling your dataset can lead to overfitting, and consequently poor generalization to the validation/test sets.
I would recommend using the class_weights parameter as specified in the Keras documentation. If you really are intent on dropping data instances from the most represented class, ensure that you tune your network topology to decrease the model's representational capacity(i.e. add Dropout and/or L2 regularization layers).
I have class imbalance problem and want to solve this using cost sensitive learning.
under sample and over sample
give weights to class to use a modified loss function
Question
Scikit learn has 2 options called class weights and sample weights. Is sample weight actually doing option 2) and class weight options 1). Is option 2) the the recommended way of handling class imbalance.
It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.
If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.
In more complex cases you can also try artificially generate samples, with techniques like SMOTE.
sample_weight and class_weight have a similar function, that is to make your estimator pay more attention to some samples.
Actual sample weights will be sample_weight * weights from class_weight.
This serves the same purpose as under/oversampling but the behavior is likely to be different: say you have an algorithm that randomly picks samples (like in random forests), it matters whether you oversampled or not.
To sum it up:
class_weight and sample_weight both do 2), option 2) is one way to handle class imbalance. I don't know of an universally recommended way, I would try 1), 2) and 1) + 2) on your specific problem to see what works best.