I'm writing a custom training loop using the code provided in the Tensorflow DCGAN implementation guide. I wanted to add callbacks in the training loop. In Keras I know we pass them as an argument to the 'fit' method, but can't find resources on how to use these callbacks in the custom training loop. I'm adding the code for the custom training loop from the Tensorflow documentation:
# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
#tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()
for image_batch in dataset:
train_step(image_batch)
# Produce images for the GIF as we go
display.clear_output(wait=True)
generate_and_save_images(generator,
epoch + 1,
seed)
# Save the model every 15 epochs
if (epoch + 1) % 15 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))
# Generate after the final epoch
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs,
seed)
I've had this problem myself: (1) I want to use a custom training loop; (2) I don't want to lose the bells and whistles Keras gives me in terms of callbacks; (3) I don't want to re-implement them all myself. Tensorflow has a design philosophy of allowing a developer to gradually opt-in to its more low-level APIs. As #HyeonPhilYoun notes in his comment below, the official documentation for tf.keras.callbacks.Callback gives an example of what we're looking for.
The following has worked for me, but can be improved by reverse engineering tf.keras.Model.
The trick is to use tf.keras.callbacks.CallbackList and then manually trigger its lifecycle events from within your custom training loop. This example uses tqdm to give attractive progress bars, but CallbackList has a progress_bar initialization argument that can let you use the defaults. training_model is a typical instance of tf.keras.Model.
from tqdm.notebook import tqdm, trange
# Populate with typical keras callbacks
_callbacks = []
callbacks = tf.keras.callbacks.CallbackList(
_callbacks, add_history=True, model=training_model)
logs = {}
callbacks.on_train_begin(logs=logs)
# Presentation
epochs = trange(
max_epochs,
desc="Epoch",
unit="Epoch",
postfix="loss = {loss:.4f}, accuracy = {accuracy:.4f}")
epochs.set_postfix(loss=0, accuracy=0)
# Get a stable test set so epoch results are comparable
test_batches = batches(test_x, test_Y)
for epoch in epochs:
callbacks.on_epoch_begin(epoch, logs=logs)
# I like to formulate new batches each epoch
# if there are data augmentation methods in play
training_batches = batches(x, Y)
# Presentation
enumerated_batches = tqdm(
enumerate(training_batches),
desc="Batch",
unit="batch",
postfix="loss = {loss:.4f}, accuracy = {accuracy:.4f}",
position=1,
leave=False)
for (batch, (x, y)) in enumerated_batches:
training_model.reset_states()
callbacks.on_batch_begin(batch, logs=logs)
callbacks.on_train_batch_begin(batch, logs=logs)
logs = training_model.train_on_batch(x=x, y=Y, return_dict=True)
callbacks.on_train_batch_end(batch, logs=logs)
callbacks.on_batch_end(batch, logs=logs)
# Presentation
enumerated_batches.set_postfix(
loss=float(logs["loss"]),
accuracy=float(logs["accuracy"]))
for (batch, (x, y)) in enumerate(test_batches):
training_model.reset_states()
callbacks.on_batch_begin(batch, logs=logs)
callbacks.on_test_batch_begin(batch, logs=logs)
logs = training_model.test_on_batch(x=x, y=Y, return_dict=True)
callbacks.on_test_batch_end(batch, logs=logs)
callbacks.on_batch_end(batch, logs=logs)
# Presentation
epochs.set_postfix(
loss=float(logs["loss"]),
accuracy=float(logs["accuracy"]))
callbacks.on_epoch_end(epoch, logs=logs)
# NOTE: This is a decent place to check on your early stopping
# callback.
# Example: use training_model.stop_training to check for early stopping
callbacks.on_train_end(logs=logs)
# Fetch the history object we normally get from keras.fit
history_object = None
for cb in callbacks:
if isinstance(cb, tf.keras.callbacks.History):
history_object = cb
assert history_object is not None
The simplest way would be to check if the loss has changed over your expected period and break or manipulate the training process if not.
Here is one way you could implement a custom early stopping callback :
def Callback_EarlyStopping(LossList, min_delta=0.1, patience=20):
#No early stopping for 2*patience epochs
if len(LossList)//patience < 2 :
return False
#Mean loss for last patience epochs and second-last patience epochs
mean_previous = np.mean(LossList[::-1][patience:2*patience]) #second-last
mean_recent = np.mean(LossList[::-1][:patience]) #last
#you can use relative or absolute change
delta_abs = np.abs(mean_recent - mean_previous) #abs change
delta_abs = np.abs(delta_abs / mean_previous) # relative change
if delta_abs < min_delta :
print("*CB_ES* Loss didn't change much from last %d epochs"%(patience))
print("*CB_ES* Percent change in loss value:", delta_abs*1e2)
return True
else:
return False
This Callback_EarlyStopping checks your metrics/loss every epoch and returns True if the relative change is less than what you expected by computing moving average of losses after every patience number of epochs. You can then capture this True signal and break the training loop. To completely answer your question, within your sample training loop you can use this as:
gen_loss_seq = []
for epoch in range(epochs):
#in your example, make sure your train_step returns gen_loss
gen_loss = train_step(dataset)
#ideally, you can have a validation_step and get gen_valid_loss
gen_loss_seq.append(gen_loss)
#check every 20 epochs and stop if gen_valid_loss doesn't change by 10%
stopEarly = Callback_EarlyStopping(gen_loss_seq, min_delta=0.1, patience=20)
if stopEarly:
print("Callback_EarlyStopping signal received at epoch= %d/%d"%(epoch,epochs))
print("Terminating training ")
break
Of course, you can increase the complexity in numerous ways, for example, which loss or metrics you would like to track, your interest in the loss at a particular epoch or moving average of loss, your interest in relative or absolute change in value, etc. You can refer to Tensorflow 2.x implementation of tf.keras.callbacks.EarlyStopping here which is generally used in the popular tf.keras.Model.fit method.
I think you would need to implement the functionality of the callback manually. It should not be too difficult. You could for instance have the "train_step" function return the losses and then implement functionality of callbacks such as early stopping in your "train" function. For callbacks such as learning rate schedule the function tf.keras.backend.set_value(generator_optimizer.lr,new_lr) would come in handy. Therefore the functionality of the callback would be implemented in your "train" function.
A custom training loop is just a normal Python loop, so you can use if statements to break the loop whenever some condition is met. For instance:
if len(loss_history) > patience:
if loss_history.popleft()*delta < min(loss_history):
print(f'\nEarly stopping. No improvement of more than {delta:.5%} in '
f'validation loss in the last {patience} epochs.')
break
If there is no improvement of delta% in the loss in the past patience epochs, the loop will be broken. Here, I'm using a collections.deque, which can easily be used as a rolling list that keeps in memory information only the last patience epochs.
Here's a full implementation, with the documentation example from the Tensorflow documentation:
patience = 3
delta = 0.001
loss_history = deque(maxlen=patience + 1)
for epoch in range(1, 25 + 1):
train_loss = tf.metrics.Mean()
train_acc = tf.metrics.CategoricalAccuracy()
test_loss = tf.metrics.Mean()
test_acc = tf.metrics.CategoricalAccuracy()
for x, y in train:
loss_value, grads = get_grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss.update_state(loss_value)
train_acc.update_state(y, model(x, training=True))
for x, y in test:
loss_value, _ = get_grad(model, x, y)
test_loss.update_state(loss_value)
test_acc.update_state(y, model(x, training=False))
print(verbose.format(epoch,
train_loss.result(),
test_loss.result(),
train_acc.result(),
test_acc.result()))
loss_history.append(test_loss.result())
if len(loss_history) > patience:
if loss_history.popleft()*delta < min(loss_history):
print(f'\nEarly stopping. No improvement of more than {delta:.5%} in '
f'validation loss in the last {patience} epochs.')
break
Epoch 1 Loss: 0.191 TLoss: 0.282 Acc: 68.920% TAcc: 89.200%
Epoch 2 Loss: 0.157 TLoss: 0.297 Acc: 70.880% TAcc: 90.000%
Epoch 3 Loss: 0.133 TLoss: 0.318 Acc: 71.560% TAcc: 90.800%
Epoch 4 Loss: 0.117 TLoss: 0.299 Acc: 71.960% TAcc: 90.800%
Early stopping. No improvement of more than 0.10000% in validation loss in the last 3 epochs.
aapa3e8's answer is correct but I am providing an implementation of Callback_EarlyStopping below that is more similar to tf.keras.callbacks.EarlyStopping
def Callback_EarlyStopping(MetricList, min_delta=0.1, patience=20, mode='min'):
#No early stopping for the first patience epochs
if len(MetricList) <= patience:
return False
min_delta = abs(min_delta)
if mode == 'min':
min_delta *= -1
else:
min_delta *= 1
#last patience epochs
last_patience_epochs = [x + min_delta for x in MetricList[::-1][1:patience + 1]]
current_metric = MetricList[::-1][0]
if mode == 'min':
if current_metric >= max(last_patience_epochs):
print(f'Metric did not decrease for the last {patience} epochs.')
return True
else:
return False
else:
if current_metric <= min(last_patience_epochs):
print(f'Metric did not increase for the last {patience} epochs.')
return True
else:
return False
I tested #Rob Hall's method with tensorboard callbacks and it did indeed work. So in my case it looked like this:
'''
tensorboard_callback = keras.callbacks.TensorBoard(
log_dir='./callbacks/tensorboard',
histogram_freq=1)
_callbacks = [tensorboard_callback]
callbacks = keras.callbacks.CallbackList(
_callbacks, add_history=True, model=encoder)
logs_ae = {}
callbacks.on_train_begin(logs=logs_ae)
...
...
'''
Related
I am training a Graph Convolutional Network (GCN). My training code is as follows (the complete code is long so I put just the corresponding snippet):
for i in range(1, runs + 1):
data = data.to(device)
model.to(device).reset_parameters()
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=5e-4)
if torch.cuda.is_available():
torch.cuda.synchronize()
t_start = time.perf_counter()
best_val_loss = float('inf')
test_acc = 0
for epoch in range(1, epochs + 1):
train(model, optimizer, data)
eval_info = evaluate(model, data)
if eval_info['val_loss'] < best_val_loss:
best_val_loss = eval_info['val_loss']
test_acc = eval_info['test_acc']
if torch.cuda.is_available():
torch.cuda.synchronize()
t_end = time.perf_counter()
My question is about resetting the parameters. If use model.to(device) instead of model.to(device).reset_parameters(), the mean accuracy of all runs (e.g. 10 runs) is around 5% higher than when I keep model.to(device).reset_parameters(). The validation losses of the two cases are nearly the same. I wonder whether I should reset the parameter in each run or not?
I have the following code that trains a model and stores logs in a results variable
import tqdm.notebook as tq
import sys
num_epochs = 10
results = {"train_loss": [], "val_loss": [], "train_acc": [], "val_acc": []}
for epoch in range(1, num_epochs+1):
sys.stdout.write(f"---Epoch {epoch}/{num_epochs}: ")
epoch_loss = {"train": [], "val": []}
epoch_acc = {"train": [], "val": []}
for phase in ['train', 'val']:
if phase=="train":
model.train(True)
else:
model.train(False)
# most important thing I learned from this project was how to fix tqdm nastiness in colab
for batch_idx, (x, y) in tq.tqdm(enumerate(dataloaders[phase]),
total=len(dataloaders[phase]),
leave=False):
# put data to device and get output
x, y = x.to(device), y.to(device)
preds = model(x)
# calc and log model loss
batch_loss = criterion(preds, y)
epoch_loss[phase].append(batch_loss.item())
# calculate acc and extend to epoch_acc
preds = torch.argmax(preds, dim=1)
batch_acc = torch.sum(preds==y)/len(y)
epoch_acc[phase].append(batch_acc)
# zero the grad
optimizer.zero_grad()
# take a step if training mode is on
if phase=="train":
batch_loss.backward()
optimizer.step()
scheduler.step()
# at the end of each epoch, calculate avg epoch train/val loss/accuracy
train_loss = sum(epoch_loss["train"])/len(epoch_loss["train"])
val_loss = sum(epoch_loss["val"])/len(epoch_loss["val"])
train_acc = 100*sum(epoch_acc["train"])/len(epoch_acc["train"])
val_acc = 100*sum(epoch_acc["val"])/len(epoch_acc["val"])
# log losses and accs every epoch
results['train_loss'].extend(epoch_loss['train'])
results['train_acc'].extend(epoch_acc['train'])
results['val_loss'].extend(epoch_loss['val'])
results['val_acc'].extend(epoch_acc['val'])
# and print it nicely
sys.stdout.write("train_loss: {:.4f} train_acc: {:.2f}% ".format(train_loss, train_acc))
sys.stdout.write("val_loss: {:.4f} val_acc: {:.2f}%\n".format(val_loss, val_acc))
I'm logging the avg accuracy and avg loss of every batch into separate training/validation loss/acc arrays. The problem is that I have more training batches so when I try to graph my training logs I get something like this:
Is there a workaround for this?
You are making a few conceptual errors:
You are calculating the validation loss/accuracy in multiple batches, as opposed to over the entire validation set
You are calculating the validation accuracy for a static model after it has already trained on all the data, as opposed to periodically assessing the validation accuracy as it is training
You should average your batch training performance over each epoch, and once per epoch calculate the complete loss/acc statistics across the entire validation set. Then you will have n_epochs values for both training and validation and can plot them on the same axes.
Please add a minimum comment on your thoughts so that I can improve my query. Thank you. -)
I'm trying to train a tf.keras model with Gradient Accumulation (GA). But I don't want to use it in the custom training loop (like) but customize the .fit() method by overriding the train_step.Is it possible? How to accomplish this? The reason is if we want to get the benefit of keras built-in functionality like fit, callbacks, we don't want to use the custom training loop but at the same time if we want to override train_step for some reason (like GA or else) we can customize the fit method and still get the leverage of using those built-in functions.
And also, I know the pros of using GA but what are the major cons of using it? Why does it's not come as a default but an optional feature with the framework?
# overriding train step
# my attempt
# it's not appropriately implemented
# and need to fix
class CustomTrainStep(keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = n_gradients
self.gradient_accumulation = [
tf.zeros_like(this_var) for this_var in self.trainable_variables
]
def train_step(self, data):
x, y = data
batch_size = tf.cast(tf.shape(x)[0], tf.float32)
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, regularization_losses=self.losses
)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
accum_gradient = [
(acum_grad+grad) for acum_grad, grad in \
zip(self.gradient_accumulation, gradients)
]
accum_gradient = [
this_grad/batch_size for this_grad in accum_gradient
]
# apply accumulated gradients
self.optimizer.apply_gradients(
zip(accum_gradient, self.trainable_variables)
)
# TODO: reset self.gradient_accumulation
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
Please, run and check with the following toy setup.
# Model
size = 32
input = keras.Input(shape=(size,size,3))
efnet = keras.applications.DenseNet121(
weights=None,
include_top = False,
input_tensor = input
)
base_maps = keras.layers.GlobalAveragePooling2D()(efnet.output)
base_maps = keras.layers.Dense(
units=10, activation='softmax',
name='primary'
)(base_maps)
custom_model = CustomTrainStep(
n_gradients=10, inputs=[input], outputs=[base_maps]
)
# bind all
custom_model.compile(
loss = keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = keras.optimizers.Adam()
)
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.expand_dims(x_train, -1)
x_train = tf.repeat(x_train, 3, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [size,size]) # if we want to resize
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 1)
Update
I've found that some others also tried to achieve this and ended up with the same issue. One has got some workaround, here, but it's too messy and I think there should be some better approach.
Update 2
The accepted answer (by Mr.For Example) is fine and works well in single strategy. Now, I like to start 2nd bounty to extend it to support multi-gpu, tpu, and with mixed-precision techniques. There are some complications, see details.
Yes it is possible to customize the .fit() method by overriding the train_step without a custom training loop, following simple example will show you how to train a simple mnist classifier with gradient accumulation:
import tensorflow as tf
class CustomTrainStep(tf.keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = tf.constant(n_gradients, dtype=tf.int32)
self.n_acum_step = tf.Variable(0, dtype=tf.int32, trainable=False)
self.gradient_accumulation = [tf.Variable(tf.zeros_like(v, dtype=tf.float32), trainable=False) for v in self.trainable_variables]
def train_step(self, data):
self.n_acum_step.assign_add(1)
x, y = data
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign_add(gradients[i])
# If n_acum_step reach the n_gradients then we apply accumulated gradients to update the variables otherwise do nothing
tf.cond(tf.equal(self.n_acum_step, self.n_gradients), self.apply_accu_gradients, lambda: None)
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
def apply_accu_gradients(self):
# apply accumulated gradients
self.optimizer.apply_gradients(zip(self.gradient_accumulation, self.trainable_variables))
# reset
self.n_acum_step.assign(0)
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign(tf.zeros_like(self.trainable_variables[i], dtype=tf.float32))
# Model
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps)
custom_model = CustomTrainStep(n_gradients=10, inputs=[input], outputs=[base_maps])
# bind all
custom_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=6, epochs=3, verbose = 1)
Outputs:
Epoch 1/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.5053 - accuracy: 0.8584
Epoch 2/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.1389 - accuracy: 0.9600
Epoch 3/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0898 - accuracy: 0.9748
Pros:
Gradient accumulation is a mechanism to split the batch of samples —
used for training a neural network — into several mini-batches of
samples that will be run sequentially
Because GA calculates the loss and gradients after each mini-batch, but instead of updating the model parameters, it waits and accumulates the gradients over consecutive batches, so it can overcoming memory constraints, i.e using less memory to training the model like it using large batch size.
Example: If you run a gradient accumulation with steps of 5 and batch
size of 4 images, it serves almost the same purpose of running with a
batch size of 20 images.
We could also parallel the training when using GA, i.e aggregate gradients from multiple machines.
Things to consider:
This technique is working so well so it is widely used, there few things to consider before using it that I don't think it should be called cons, after all, all GA does is turning 4 + 4 to 2 + 2 + 2 + 2.
If your machine has sufficient memory for the batch size that already large enough then there no need to use it, because it is well known that too large of a batch size will lead to poor generalization, and it will certainly run slower if you using GA to achieve the same batch size that your machine's memory already can handle.
Reference:
What is Gradient Accumulation in Deep Learning?
Thanks to #Mr.For Example for his convenient answer.
Usually, I also observed that using Gradient Accumulation, won't speed up training since we are doing n_gradients times forward pass and compute all the gradients. But it will speed up the convergence of our model. And I found that using the mixed_precision technique here can be really helpful here. Details here.
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
Here is a complete gist.
I am doing multi-class classification for a recommender system (item recommendations), and I'm currently training my network using sparse_categorical_crossentropy loss. Therefore, it is reasonable to perform EarlyStopping by monitoring my validation loss, val_loss as such:
tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
which works as expected. However, the performance of the network (recommender system) is measured by Average-Precision-at-10, and is tracked as a metric during training, as average_precision_at_k10. Because of this, I could also perform early stopping with this metric as such:
tf.keras.callbacks.EarlyStopping(monitor='average_precision_at_k10', patience=10)
which also works as expected.
My problem:
Sometimes the validation loss increases, whilst the Average-Precision-at-10 is improving and vice-versa. Because of this, I would need to monitor both, and perform early stopping, if and only if both are deteriorating. What I would like to do:
tf.keras.callbacks.EarlyStopping(monitor=['val_loss', 'average_precision_at_k10'], patience=10)
which obviously does not work. Any ideas how this could be done?
With guidance from Gerry P above I managed to create my own custom EarlyStopping callback, and thought I post it here in case anyone else are looking to implement something similar.
If both the validation loss and the mean average precision at 10 does not improve for patience number of epochs, early stopping is performed.
class CustomEarlyStopping(keras.callbacks.Callback):
def __init__(self, patience=0):
super(CustomEarlyStopping, self).__init__()
self.patience = patience
self.best_weights = None
def on_train_begin(self, logs=None):
# The number of epoch it has waited when loss is no longer minimum.
self.wait = 0
# The epoch the training stops at.
self.stopped_epoch = 0
# Initialize the best as infinity.
self.best_v_loss = np.Inf
self.best_map10 = 0
def on_epoch_end(self, epoch, logs=None):
v_loss=logs.get('val_loss')
map10=logs.get('val_average_precision_at_k10')
# If BOTH the validation loss AND map10 does not improve for 'patience' epochs, stop training early.
if np.less(v_loss, self.best_v_loss) and np.greater(map10, self.best_map10):
self.best_v_loss = v_loss
self.best_map10 = map10
self.wait = 0
# Record the best weights if current results is better (less).
self.best_weights = self.model.get_weights()
else:
self.wait += 1
if self.wait >= self.patience:
self.stopped_epoch = epoch
self.model.stop_training = True
print("Restoring model weights from the end of the best epoch.")
self.model.set_weights(self.best_weights)
def on_train_end(self, logs=None):
if self.stopped_epoch > 0:
print("Epoch %05d: early stopping" % (self.stopped_epoch + 1))
It is then used as:
model.fit(
x_train,
y_train,
batch_size=64,
steps_per_epoch=5,
epochs=30,
verbose=0,
callbacks=[CustomEarlyStopping(patience=10)],
)
You can achieve this by by creating a custom callback. Information on how to do that is located here. Below is some code that illustrates what you can do in a custom callback. The documentation I referenced shows many other options.
class LRA(keras.callbacks.Callback): # subclass the callback class
# create class variables as below. These can be accessed in your code outside the class definition as LRA.my_class_variable, LRA.best_weights
my_class_variable=something # a class variable
best_weights=model.get_weights() # another class variable
# define an initialization function with parameters you want to feed to the callback
def __init__(self, param1, param2, etc):
super(LRA, self).__init__()
self.param1=param1
self.param2=param2
etc for all parameters
# write any initialization code you need here
def on_epoch_end(self, epoch, logs=None): # method runs on the end of each epoch
v_loss=logs.get('val_loss') # example of getting log data at end of epoch the validation loss for this epoch
acc=logs.get('accuracy') # another example of getting log data
LRA.best_weights=model.get_weights() # example of setting class variable value
print(f'Hello epoch {epoch} has just ended') # print a message at the end of every epoch
lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
if v_loss > self.param1:
new_lr=lr * self.param2
tf.keras.backend.set_value(model.optimizer.lr, new_lr) # set the learning rate in the optimizer
# write whatever code you need
I recommend you to create your own callback.
In the following I added a solution that monitors both the accuracy and the loss. You can replace the acc with your own metric:
class CustomCallback(keras.callbacks.Callback):
acc = {}
loss = {}
best_weights = None
def __init__(self, patience=None):
super(CustomCallback, self).__init__()
self.patience = patience
def on_epoch_end(self, epoch, logs=None):
epoch += 1
self.loss[epoch] = logs['loss']
self.acc[epoch] = logs['accuracy']
if self.patience and epoch > self.patience:
# best weight if the current loss is less than epoch-patience loss. Simiarly for acc but when larger
if self.loss[epoch] < self.loss[epoch-self.patience] and self.acc[epoch] > self.acc[epoch-self.patience]:
self.best_weights = self.model.get_weights()
else:
# to stop training
self.model.stop_training = True
# Load the best weights
self.model.set_weights(self.best_weights)
else:
# best weight are the current weights
self.best_weights = self.model.get_weights()
Please bear in mind that if you want to control the minimum change in the monitored quantity (aka. min_delta) you have to integrate it in the code.
Here is the documentation for how to build your custome callback: custom_callback
At this point it would be more simple to make a custom loop and just use if-statements. E.g.:
def main(epochs=50):
for epoch in range(epochs):
fit(epoch)
if test_acc.result() > .8 and topk_acc.result() > .9:
print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
break
if __name__ == '__main__':
main(epochs=100)
Here's a simple custom training loop using this method:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow_datasets as tfds
import tensorflow as tf
data, info = tfds.load('iris', split='train',
as_supervised=True,
shuffle_files=True,
with_info=True)
def preprocessing(inputs, targets):
scaled = tf.divide(inputs, tf.reduce_max(inputs, axis=0))
return scaled, targets
dataset = data.filter(lambda x, y: tf.less_equal(y, 2)).\
map(preprocessing).\
shuffle(info.splits['train'].num_examples)
train_dataset = dataset.take(120).batch(4)
test_dataset = dataset.skip(120).take(30).batch(4)
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
])
loss_object = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
train_loss = tf.metrics.Mean()
test_loss = tf.metrics.Mean()
train_acc = tf.metrics.SparseCategoricalAccuracy()
test_acc = tf.metrics.SparseCategoricalAccuracy()
topk_acc = tf.metrics.SparseTopKCategoricalAccuracy(k=2)
opt = tf.keras.optimizers.Adam(learning_rate=1e-3)
#tf.function
def train_step(inputs, labels):
with tf.GradientTape() as tape:
logits = model(inputs)
loss = loss_object(labels, logits)
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_acc(labels, logits)
#tf.function
def test_step(inputs, labels):
logits = model(inputs)
loss = loss_object(labels, logits)
test_loss.update_state(loss)
test_acc.update_state(labels, logits)
topk_acc.update_state(labels, logits)
def fit(epoch):
template = 'Epoch {:>2} Train Loss {:.3f} Test Loss {:.3f} ' \
'Train Acc {:.2f} Test Acc {:.2f} Test TopK Acc {:.2f} '
train_loss.reset_states()
test_loss.reset_states()
train_acc.reset_states()
test_acc.reset_states()
topk_acc.reset_states()
for X_train, y_train in train_dataset:
train_step(X_train, y_train)
for X_test, y_test in test_dataset:
test_step(X_test, y_test)
print(template.format(
epoch + 1,
train_loss.result(),
test_loss.result(),
train_acc.result(),
test_acc.result(),
topk_acc.result()
))
def main(epochs=50):
for epoch in range(epochs):
fit(epoch)
if test_acc.result() > .8 and topk_acc.result() > .9:
print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
break
if __name__ == '__main__':
main(epochs=100)
I've written a code in PyTorch with my own implemented loss function focal_loss_fixed. But my loss value stays fixed after every epoch. Looks like weights are not being updated. Here is my code snippet:
optimizer = optim.SGD(net.parameters(),
lr=lr,
momentum=0.9,
weight_decay=0.0005)
for epoch in T(range(20)):
net.train()
epoch_loss = 0
for n in range(len(x_train)//batch_size):
(imgs, true_masks) = data_gen_small(x_train, y_train, iter_num=n, batch_size=batch_size)
temp = []
for tt in true_masks:
temp.append(tt.reshape(128, 128, 1))
true_masks = np.copy(np.array(temp))
del temp
imgs = np.swapaxes(imgs, 1,3)
imgs = torch.from_numpy(imgs).float().cuda()
true_masks = torch.from_numpy(true_masks).float().cuda()
masks_pred = net(imgs)
masks_probs = F.sigmoid(masks_pred)
masks_probs_flat = masks_probs.view(-1)
true_masks_flat = true_masks.view(-1)
print((focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy()))))
loss = torch.from_numpy(np.array(focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy())))).float().cuda()
loss = Variable(loss.data, requires_grad=True)
epoch_loss *= (n/(n+1))
epoch_loss += loss.item()*(1/(n+1))
print('Step: {0:.2f}% --- loss: {1:.6f}'.format(n * batch_size* 100.0 / len(x_train), epoch_loss), end='\r')
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('Epoch finished ! Loss: {}'.format(epoch_loss))
And this is my `focal_loss_fixed' function:
def focal_loss_fixed(true_data, pred_data):
gamma=2.
alpha=.25
eps = 1e-7
# print(type(y_true), type(y_pred))
pred_data = K.clip(pred_data,eps,1-eps)
pt_1 = tf.where(tf.equal(true_data, 1), pred_data, tf.ones_like(pred_data))
pt_0 = tf.where(tf.equal(true_data, 0), pred_data, tf.zeros_like(pred_data))
with tf.Session() as sess:
return sess.run(-K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0)))
After each epoch the loss value stays constant(5589.60328). What's wrong with it?
When computing the loss you call focal_loss_fixed() which uses TensorFlow to compute the loss value. focal_loss_fixed() creates a graph and runs it in a session to get the value, and by this point PyTorch has no idea of the sequence of operations that led to the loss because they were computed by the TensorFlow backend. It is likely then, that all PyTorch sees in loss is a constant, as if you had written
loss = 3
So the gradient will be zero, and the parameters will never be updated. I suggest you rewrite your loss function using PyTorch operations so that the gradient with respect to its inputs can be computed.
I think the problem lies in your heavy weight decay.
Essentially, you are not reducing the weight by x, but rather you multiply the weights by x, which means that you are instantaneously only doing very small increments, leading to a (seemingly) plateauing loss function.
More explanation on this can be found in the PyTorch discussion forum (e.g., here, or here).
Unfortunately, the source for SGD alone also does not tell you much about its implementation.
Simply setting it to a larger value should result in better updates. You can start by leaving it out completely, and then iteratively reducing it (from 1.0), until you get more decent results.