Calculating F1 score, precision, recall in tfhub retraining script - python

I am using tensorflow hub for image retraining classification task. The tensorflow script retrain.py by default calculates cross_entropy and accuracy.
train_accuracy, cross_entropy_value = sess.run([evaluation_step, cross_entropy],feed_dict={bottleneck_input: train_bottlenecks, ground_truth_input: train_ground_truth})
I would like to get F1 score, precision, recall and confusion matrix. How could I get these values using this script ?

Below I include a method to calculate desired metrics using scikit-learn package.
You can calculate F1 score, precision and recall using precision_recall_fscore_support method and the confusion matrix using confusion_matrix method:
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix
Both methods take two 1D array-like objects which store ground truth and predicted labels respectively.
In the code provided, ground-truth labels for training data are stored in train_ground_truth variable which is defined in lines 1054 and 1060, while validation_ground_truth stores ground-truth labels for validation data and is defined in line 1087.
The tensor that calculates predicted class labels is defined and returned by add_evaluation_step function. You can modify line 1034 in order to capture that tensor object:
evaluation_step, prediction = add_evaluation_step(final_tensor, ground_truth_input)
# now prediction stores the tensor object that
# calculates predicted class labels
Now you can update line 1076 in order to evaluate prediction when calling sess.run():
train_accuracy, cross_entropy_value, train_predictions = sess.run(
[evaluation_step, cross_entropy, prediction],
feed_dict={bottleneck_input: train_bottlenecks,
ground_truth_input: train_ground_truth})
# train_predictions now stores class labels predicted by model
# calculate precision, recall and F1 score
(train_precision,
train_recall,
train_f1_score, _) = precision_recall_fscore_support(y_true=train_ground_truth,
y_pred=train_predictions,
average='micro')
# calculate confusion matrix
train_confusion_matrix = confusion_matrix(y_true=train_ground_truth,
y_pred=train_predictions)
Similarly, you can compute metrics for validation subset by modifying line 1095:
validation_summary, validation_accuracy, validation_predictions = sess.run(
[merged, evaluation_step, prediction],
feed_dict={bottleneck_input: validation_bottlenecks,
ground_truth_input: validation_ground_truth})
# validation_predictions now stores class labels predicted by model
# calculate precision, recall and F1 score
(validation_precision,
validation_recall,
validation_f1_score, _) = precision_recall_fscore_support(y_true=validation_ground_truth,
y_pred=validation_predictions,
average='micro')
# calculate confusion matrix
validation_confusion_matrix = confusion_matrix(y_true=validation_ground_truth,
y_pred=validation_predictions)
Finally, the code calls run_final_eval to evaluate trained model on test data. In this function, prediction and test_ground_truth are already defined, so you only need to include code to calculate required metrics:
test_accuracy, predictions = eval_session.run(
[evaluation_step, prediction],
feed_dict={
bottleneck_input: test_bottlenecks,
ground_truth_input: test_ground_truth
})
# calculate precision, recall and F1 score
(test_precision,
test_recall,
test_f1_score, _) = precision_recall_fscore_support(y_true=test_ground_truth,
y_pred=predictions,
average='micro')
# calculate confusion matrix
test_confusion_matrix = confusion_matrix(y_true=test_ground_truth,
y_pred=predictions)
Note that the provided code calculates global F1-scores by setting average='micro'. The different averaging methods that are supported by scikit-learn package are described in User Guide.

Related

Python Keras weighted accuracy metric is much different than regular accuracy metric

I am training a Transformer model for Time Series Classification. To check the results, I am using a Baseline model which uses the previous target as the next prediction. I am using a data generator to handle the data. The dataset is imbalanced so I am using sample_weights to deal with this, so the data generator outputs 3 variables: inputs, labels, sample_weights.
I have tried setting the sample_weights to all 1's in order to test that things are working well. The baseline model produces identical results with weighted accuracy and regular accuracy, which is expected. However for the Transformer, I am seeing completely different values for the weighted accuracy and regular accuracy, even though the sample_weights are all 1's. Since the sample weights are all 1's I would expect the weighted accuracy to be the same as the regular accuracy, Why are these different?
It seems like the regular metric is normalized to 1, while the weighted metric is normalized to 100, but why would these be different in this case?
Code:
Function from data generator class to get sample weights
def get_sample_weights(self, inputs, labels):
''' Obtains sample weights for any number of classes.
NOTE: sample_weights pertain a weighting to each label
'''
# get initial sample weights
sample_weights = tf.ones_like(labels, dtype=tf.float64)
# get classes and counts for each one
class_counts = np.bincount(self.train_df.price_change)
total = class_counts.sum()
n_classes = len(class_counts)
weights = tf.constant([1, 1, 1], dtype=tf.float64)
for idx, count in enumerate(class_counts):
# compute weight
# weight = total / (n_classes*count)
weight = weights[idx]
# update weight value
sample_weights = tf.where(tf.equal(labels, float(idx)),
weight,
sample_weights)
return inputs, labels, sample_weights
get baseline results
baseline = Baseline(label_index=single_gen.column_indices['price_change'])
baseline.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'],
weighted_metrics=['accuracy'])
train_metrics = baseline.evaluate(single_gen.train))
val_metrics = baseline.evaluate(single_gen.valid))
Get results with Transformer
transformer_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
metrics=['sparse_categorical_accuracy'],
weighted_metrics=['sparse_categorical_accuracy'])
history = transformer_model.fit(aapl_gen.train,
epochs=2,
validation_data=aapl_gen.valid)
train_metrics = transformer_model.evaluate(data_gen.train)
val_metrics = transformer_model.evaluate(data_gen.valid)

How to output mean and stdv of Gaussian Process Classifier in sklearn?

Im fitting some data for a classification task using Gaussian Process Classifiers in sklearn. I know that for the Gaussian Process Regressor one can pass return_std in
y_test, std = gp.predict(x_test, return_std=True)
to output the standard deviation of the test sample (like in this question)
However, I couldn't find such a parameter for the GP Classifier.
Is there such thing as outputting the predictive mean and stdv of test data from a GP Classifiers? And is there a way to output the posterior mean and covariance of the fitted model?
There is not standard deviation for categorical data, hence there is no the parameter return_std in the Classifier.
However, if you want to quantify the uncertainty of the classifier predictions, you could use the .predict_proba(X)method. Once you get the probabilites of each posible class you could compute the entropy of the predicted probabilities.
You could get the variance associated with the logit function by going to the predict_proba function definition in _gpc.py and returning the 'var_f_star' value. I have modified the predict_proba and created a function to return the logit variance below:
def predict_var(self, X):
"""Return probability estimates for the test vector X.
Parameters
----------
X : array-like of shape (n_samples, n_features) or list of object
Query points where the GP is evaluated for classification.
Returns
-------
C : array-like of shape (n_samples, n_classes)
Returns the probability of the samples for each class in
the model. The columns correspond to the classes in sorted
order, as they appear in the attribute ``classes_``.
"""
check_is_fitted(self)
# Based on Algorithm 3.2 of GPML
K_star = self.kernel_(self.X_train_, X) # K_star =k(x_star)
f_star = K_star.T.dot(self.y_train_ - self.pi_) # Line 4
v = solve(self.L_, self.W_sr_[:, np.newaxis] * K_star) # Line 5
# Line 6 (compute np.diag(v.T.dot(v)) via einsum)
var_f_star = self.kernel_.diag(X) - np.einsum("ij,ij->j", v, v)

Costumizing loss function in keras with condition

I want to setup a keras model (tensorflow backend) for a multiclassification problem with 4 different classes. I have both labeled and unlabeled data.
I have worked out the case in which I only train with the labeled data and my model looks something like this:
# create model
inputs = keras.Input(shape=(len(config.variables), ))
X = layers.Dense(units=200, activation="relu")(inputs)
output = layers.Dense(units=4, activation="softmax", name="output")(X)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer=optimizers.Adam(1e-4), loss=loss_function, metrics=["accuracy"])
# train model
model.fit(
x=train_data,
y=train_class_labels,
batch_size=200,
epochs=200,
verbose=2,
validation_split=0.2,
sample_weight = class_weights
)
I have functioning models with to different losses namely categorical_crossentropy and sparse_categorical_crossentropy, and depending on the loss function my train_class_labels where in one-hot representation (e.g. [ [0,1,0,0], [0,0,0,1], ...]) or in the integer representation (e.g. [0,0,2,1,0,3, ...]) and everything worked fine. class_weights is some weight vector ([0.78, 1,34, ...])
Now for my further plans I need to include the unlabeled data in the training process but I need it to be ignored by the loss function.
What I have tried:
setting the labels from the unlabeled data to [0,0,0,0] when using categorical_crossentropy as a loss, because i thought then my unlabeled data would be ignored by the loss function. Somehow this changed the predictions after training.
I also tried setting the weights from the unlabeled data to 0 but that did have an effect either
I concluded that I need to somehow mark me unlabeled data and customize my loss function so that it can be told to ignore those samples. Something like
def custom_loss(y_true, y_pred):
if y_true == labeled data:
return normal loss function
if y_true == unlabeled data:
return 0
Those are some snippets that I have found but they do not seem to work:
def custom_loss(y_true, y_pred):
loss = losses.sparse_categorical_crossentropy(y_true, y_pred)
return K.switch(K.flatten(K.equal(y_true, -1)), K.zeros_like(loss), loss)
def custom_loss2(y_true, y_pred):
idx = tf.not_equal(y_true, -1)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
return losses.sparse_categorical_crossentropy(y_true, y_pred)
In those examples I set the labels from the unlabeled data to -1 so train_class_labels would look something like this: [0,-1,2,0,3, ... ]
But when using the first loss function I just get Nans and when using the second one I get the following error:
Invalid argument: logits and labels must have the same first dimension, got logits shape [1,5000] and labels shape [5000]
I think that setting the labels to [0,0,0,0] would be just fine. Because the loss is calculated by sum of the log losses of your instances per class (in your case the loss would be 0 for instances with no label).
I don't understand why you are inserting non labeled data in your training in a supervised setting.
I think that the differences that you obtain are due to the batch size and to the gradient step. If there are instances that do not contribute to the gradient descent, the loss calculated would be different than before, and then you get the difference in prediction.
Basically there would be less informative instances per batch.
If you use as batch size the size of all the dataset there would be no difference from a previous training without the unlabeled instances (but always with a training with batch size = size of the dataset)

How can I transform catboosts raw prediction score (RawFormulaVal) into a probability?

For some objects from catboost library (like the python code export model - https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostclassifier_save_model-docpage/) predictions (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_apply_catboost_model-docpage/) will only give a so called raw score per record (parameter values is called "RawFormulaVal").
Other API functions also allow the result of a prediction to be a probability for the target class (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostclassifier_predict-docpage/) - parameter value is called "Probability".
I would like to know
how this is related to probabilities (in case of a binary classification) and
if it can be transformed in such a one using the python API (https://tech.yandex.com/catboost/doc/dg/concepts/python-quickstart-docpage/)?
The raw score from the catboost prediction function with type "RawFormulaVal" are the log-odds (https://en.wikipedia.org/wiki/Logit).
So if we apply the function "exp(score) / (1+ exp(score))" we get the probabilities as if we would have used the prediction formula with type "Probability".
The line of code model.predict_proba(evaluation_dataset) will compute probabilities directly.
Following is a sample code to understand:
from catboost import Pool, CatBoostClassifier, cv
train_dataset = Pool(data=X_train,
label=y_train,
cat_features=cat_features)
eval_dataset = Pool(data=X_valid,
label=y_valid,
cat_features=cat_features)
# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=30,
learning_rate=1,
depth=2,
loss_function='MultiClass')
# Fit model
model.fit(train_dataset)
# Get predicted classes
preds_class = model.predict(eval_dataset)
# Get predicted probabilities for each class
preds_proba = model.predict_proba(eval_dataset)
# Get predicted RawFormulaVal
preds_raw = model.predict(eval_dataset,
prediction_type='RawFormulaVal')
model.fit(train_dataset,
use_best_model=True,
eval_set=eval_dataset)
print("Count of trees in model = {}".format(model.tree_count_))
print(preds_proba)
print(preds_raw)

How do masked values affect the metrics in Keras?

If I look into keras metric I see that the values of y_true and y_predict are "just" compared at the end of each epoch for categorical_accuracy:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
How are masked values handled? If I understood correctly, masking prohibits the masked values to influence the training, but it still produces predictions for the masked values. Thereby, it does, in my opinion, influence the metric.
More explanation on how it influences the metric:
In the padding/masking process, I set the padded/masked values in y_true to an unused class e.g. class 0.
If now argmax() is looking for a max value in the one-hot encoded y_true, it will just return 0 as the total (masked) row is the same.
I do not have a class 0, as it is my masking value/class, and thereby the y_pred and y_true certainly have different values creating a reduced accuracy.
Is this somehow already thought of in the Keras metric and I oversaw it?
Otherwise, I would have to create a custom metric or callback creating a similar metric to categorical_accuracy with the addition that all masked values are eliminated in y_pred and y_true before comparison.
Maybe the best answer would be this from Keras.metrics :
A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.
The training is only influenced by the loss function where masking is implemented.
Nevertheless, your displayed results are not on par with the actual results and can lead to misleading conclusions.
As the metric is not used in the training process, a callback function can solve this.
something like this (based on Andrew Ng). I Search for 0 here as for my masked target all one-hot-encoded targets are 0 (No class activated).
import numpy as np
from keras.callbacks import Callback
from sklearn.metrics import accuracy_score
class categorical_accuracy_no_mask(Callback):
def on_train_begin(self, logs={}):
self.val_acc = []
def on_epoch_end(self, epoch, logs={}):
val_predict = (np.asarray(self.model.predict(self.model.validation_data[0]))).round()
val_targ = self.model.validation_data[1]
indx = np.where(~val_targ.any(axis=2))[0] #find where all targets are zero. That are the masked once as we masked the target with 0 and the data with 666
y_true_nomask = numpy.delete(val_targe, indx, axis=0)
y_pred_nomask = numpy.delete(val_predict, indx, axis=0)
_val_accuracy = accuracy_score(y_true_nomask, y_pred_nomask)
self.val_acc.append(_val_accuracy)
print “ — val_accuracy : %f ” %( _val_accuracy )
return
Of course, now you could add also precision-recall etc.

Categories

Resources