reporting other metrics during training evaluation simpletransformers - python

I am training a text classification model over a large set of data and I am using bert classifier (bert-base-uncased) of simpletransformer library. Simpletransformer retports by default mcc and eval_loss for evaluation during training and the test(eval) phase. I was able to set additional metrics such as acc, f1 etc. for the test phase (by sending extra metrics to the eval_model function), But I don't know how to tell simpletransformer to report these metrics during the training phase as well? Is it possible to do the same thing with train_model function?
It is worth mentioning that eval_during_training option is True.
It prints the mcc and eval_loss of the training for each checkpoint(in eval_results.txt in outputs) and I need other metrics to be reported in each checkpoint as well.
result, model_outputs, wrong_predictions = model.eval_model(eval_df, f1=f1_multiclass, acc=accuracy_score)
Thanks in advance
cheers

After surfing the web, I couldn't find the answer to my question. So, I started looking at the source code. It turns out it is way simpler than I thought. To include more metrics during training you need to include them just the way you include them in the eval_model method. Here is a sample code that shows how to feed extra metrics to simpletransformer train_model and eval_model.
def f1_multiclass(labels, preds):
return f1_score(labels, preds, average='weighted')
def prec_multiclass(labels, preds):
return precision_score(labels, preds, average='weighted')
def recall_multiclass(labels, preds):
return recall_score(labels, preds, average='weighted')
model.train_model(train_df, eval_df=test_df,
f1=f1_multiclass,
acc=accuracy_score,
prec=prec_multiclass,
recall=recall_multiclass,
cohen=cohen_kappa_score)
result, model_outputs, wrong_predictions = model.eval_model(test_df,
f1=f1_multiclass,
acc=accuracy_score,
prec=prec_multiclass,
recall=recall_multiclass,
cohen=cohen_kappa_score)

Related

ModelCheckpoint monitoring values when the model has multiple outputs

My model has two outputs, I want to monitor one to save my model.
Below is part of my code. The version of TensorFlow is 2.0
model = MobileNetBaseModel()()
model.compile(optimizer=tf.keras.optimizers.Adam(),
metrics={"pitch_yaw_roll": "mae"},
loss={"pitch_yaw_roll": compute_mse_loss, # or "mse"
"total_logits": compute_cross_entropy_loss(num_classes=num_classes)},
loss_weights= {"pitch_yaw_roll":mse_weight, "total_logits":cross_entropy_weight})
file_path = os.path.join(checkpoint_path, "model.{epoch:2d}-{val_loss:.2f}.h5")
tf.keras.callbacks.ModelCheckpoint(filepath=file_path,
monitor="val_loss",
verbose=1,
save_freq=save_freq,
save_best_only=True)
The default monitor='val_loss' in the ModelCheckpoint callback, how do I choose what I need? I want to monitor {"pitch_yaw_roll": "mae"}.
If you want ModelCheckpoint to save according to another metric value, use the key of that metric in the .compile(metrics={...}, ...) metrics dictionary.
So for example, if you would like to save only the best "pitch_yaw_roll" epoch result (best being the minimum value) you should use
tf.keras.callbacks.ModelCheckpoint(filepath=file_path,
monitor="val_pitch_yaw_roll",
verbose=1,
mode="min",
save_freq=save_freq,
save_best_only=True)
If you opt for "pitch_yaw_roll" instead of "val_pitch_yaw_roll" it will save according to the training loss and not according to the validation loss
Just adding to comment above, I belive your checkpoint doesn't work because of incorrect name of value to monitor.
General, solution here might be to have a peak into history that your fit creates.
history = model.fit(...)
pd.DataFrame(history.history)
there you will find names of metrics you should use in monitor statement.

DNNClassifier's loss with EarlyStopping

I'm trying to use a hook in my DNNClassifier model using tensorflow.keras.callbacks.EarlyStopping but I have no idea what to put in monitor. The documentation is not exactly helpful here.
From looking at the code a softmax cross-entropy is used as the loss function but for DNNRegressor the loss node is dnn/head/weighted_loss/Sum as per this thread. I have tried getting Tensorboard up and running but I am not able to and the import script from a saved model is equally defective on my machine.
Is there any way to figure out what the node of the DNNClassifier's loss is?
The monitor does not refer to a graph node or a layer, but to a loss or metric value. Indeed any value can be used that is present in your logs dictionary: https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/keras/callbacks.py#L676
You can inspect the values you have in logs without debugging, by using CSVLogger for instance:
csv_logger = CSVLogger(filename=os.path.join(args.log_dir, 'train.csv'), separator=',', append=False)
If you cannot write to a file, you can print out everything you have in logs to stdout:
mycallback = LambdaCallback(on_epoch_end=lambda epoch, logs: print('\n'.join(['{}: {}'.format(k, v) for k, v in logs.items()])))
In case you do not have the metric in logs, you can use LambdaCallback to put it there. For instance:
eval_callback = LambdaCallback(on_epoch_end=lambda epoch, logs: logs.update({'metric_name': get_metric_value()}))
early_stopping = EarlyStopping(monitor='metric_name', min_delta=0.0, patience=10, verbose=1, mode='min')

Return number of epochs for EarlyStopping callback in Keras

Is there any way to return the number of epochs after which the training was stopped in Keras when using the EarlyStopping callback?
I can get the log of the training and validation loss and compute the number of epochs myself using the patience parameter, but is there a more direct way?
Use EarlyStopping.stopped_epoch attribute: remember the callback in a separate variable, say callback, and check callback.stopped_epoch after the training stopped.
Subtracting the patience value from the total number of epochs - as suggested in this comment - might not work in some situations. For instance, if you set epochs=100 and patience=20, if the best accuracy/loss value is found at epoch 90, the training will stop at epoch 100. So with this approach you would get a wrong number (100-20 = 80).
Moreover, as noted in this comment, using EarlyStopping.stopped_epoch only gives you the epoch when the training has been stopped, but NOT the epoch when the best weights are saved. This is particularly useful when you set save_best_weights=True or rely on ModelCheckpoint to save the best model before stopping the training.
Therefore my solution is to get the index of model history array, with the best value. Assuming that the metric used is the validation accuracy, relying on numpy, here is some code:
import numpy as np
model.fit(...)
hist = model.history.history['val_acc']
n_epochs_best = np.argmax(hist)
You can also leverage History() call back to find out the number of epochs it was ran for. Ex:
from keras.callbacks import History, EarlyStopping
history = History()
callback = [history, EarlyStopping(monitor='val_loss', patience=5, verbose=1, min_delta=1e-4)]
history = model.fit_generator(...., callbacks=callbacks)
number_of_epochs_it_ran = len(history.history['loss'])

Tensorflow - Using tf.summary with 1.2 Estimator API

I'm trying to add some TensorBoard logging to a model which uses the new tf.estimator API.
I have a hook set up like so:
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
summary_op=tf.summary.merge_all())
# ...
classifier.train(
input_fn,
steps=1000,
hooks=[summary_hook])
In my model_fn, I am also creating a summary -
def model_fn(features, labels, mode):
# ... model stuff, calculate the value of loss
tf.summary.scalar("loss", loss)
# ...
However, when I run this code, I get the following error from the summary_hook:
Exactly one of scaffold or summary_op must be provided. This is probably because tf.summary.merge_all() is not finding any summaries and is returning None, despite the tf.summary.scalar I declared in the model_fn.
Any ideas why this wouldn't be working?
Use tf.train.Scaffold() and pass tf.merge_all as following
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
scaffold=tf.train.Scaffold(summary_op=tf.summary.merge_all()))
Just for whoever have this question in the future, the selected solution doesn't work for me (see my comments in the selected solution).
Actually, with TF 1.2 Estimator API, one doesn't need to have summary_hook. I just have tf.summary.scalar("loss", loss) in the model_fn, and run the code without summary_hook. The loss is recorded and shown in the tensorboard. I'm not sure if TF API was changed after this and similar questions.
with Tensorflow ver-r1.3
Add your summary ops in your estimator model_fn
example :
tf.summary.histogram(tensorOp.name, tensorOp)
If you feel writing summaries may consume time and space, you can control the writing frequency of summaries, in your Estimator run_config
run_config = tf.contrib.learn.RunConfig()
run_config = run_config.replace(model_dir=FLAGS.model_dir)
run_config = run_config.replace(save_summary_steps=150)
Note: this will affect the overall summary writer frequency for TensorBoard logging, of your estimator (tf.estimator.Estimator)

How to use tf.maybe_batch to switch between train/val pipelines?

I have a pipeline to read train and validation datasets from tfrecords.
I build batches using tf.train.batch. During training I want to switch between training and evaluation on validation dataset.
Here is simplified snippet of code how I implement it now.
is_training_pl = tf.placeholder(tf.bool)
images_train, labels_train = tf.train.batch([img_train, label_train])
images_val, labels_val = tf.train.batch([img_val, label_val])
data = tf.cond(is_training_pl, lambda: [images_train, labels_train], lambda: [images_val, labels_val])
loss = my_model(input=data)
I know that one can do it with tf.cond, but the problem with it is that both train and val batch ops would be executed when tf.cond is called.
On github ebrevdo told (link to the comment) that it's possible to use tf.train.maybe_batch for this purpose instead, which is more efficient.
Can anyone give an example of how to use tf.train.batch in my case please?

Categories

Resources