Early stop when validation loss satisfies certain criteria

Early stop when validation loss satisfies certain criteria - python

I am training a neural net model in Keras. I want to monitor the validation loss and stop the training when certain condition is attained.
I know I can use EarlyStopping to stop the training when there is no improvement in training for a given number of patience rounds.
I want to something different. I want to stop the training when the val_loss is going above a value say x after n rounds.
To make things clear, Let's say x in 0.5 and n is 50. I want to stop the model's training only if the epoch number is greater than 50 and val_loss is above 0.5.
How can I do this in Keras.?

You can define your own callback by inheriting from the Keras EarlyStopping callback and overriding it with your own logic:
from keras.callbacks import EarlyStopping # use as base class
class MyCallBack(EarlyStopping):
def __init__(self, threshold, min_epochs, **kwargs):
super(MyCallBack, self).__init__(**kwargs)
self.threshold = threshold # threshold for validation loss
self.min_epochs = min_epochs # min number of epochs to run
def on_epoch_end(self, epoch, logs=None):
current = logs.get(self.monitor)
if current is None:
warnings.warn(
'Early stopping conditioned on metric `%s` '
'which is not available. Available metrics are: %s' %
(self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
)
return
# implement your own logic here
if (epoch >= self.min_epochs) & (current >= self.threshold):
self.stopped_epoch = epoch
self.model.stop_training = True
Small example to illustrate that it should work:
from keras.layers import Input, Dense
from keras.models import Model
import numpy as np
# Generate some random data
features = np.random.rand(100, 5)
labels = np.random.rand(100, 1)
validation_feat = np.random.rand(100, 5)
validation_labels = np.random.rand(100, 1)
# Define a simple model
input_layer = Input((5, ))
dense_layer = Dense(10)(input_layer)
output_layer = Dense(1)(dense_layer)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='mse', optimizer='sgd')
# Fit with custom callback
callbacks = [MyCallBack(threshold=0.001, min_epochs=10, verbose=1)]
model.fit(features, labels, validation_data=(validation_feat, validation_labels), callbacks=callbacks, epochs=100)

Related

Early stopping with multiple conditions

I am doing multi-class classification for a recommender system (item recommendations), and I'm currently training my network using sparse_categorical_crossentropy loss. Therefore, it is reasonable to perform EarlyStopping by monitoring my validation loss, val_loss as such:
tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
which works as expected. However, the performance of the network (recommender system) is measured by Average-Precision-at-10, and is tracked as a metric during training, as average_precision_at_k10. Because of this, I could also perform early stopping with this metric as such:
tf.keras.callbacks.EarlyStopping(monitor='average_precision_at_k10', patience=10)
which also works as expected.
My problem:
Sometimes the validation loss increases, whilst the Average-Precision-at-10 is improving and vice-versa. Because of this, I would need to monitor both, and perform early stopping, if and only if both are deteriorating. What I would like to do:
tf.keras.callbacks.EarlyStopping(monitor=['val_loss', 'average_precision_at_k10'], patience=10)
which obviously does not work. Any ideas how this could be done?

With guidance from Gerry P above I managed to create my own custom EarlyStopping callback, and thought I post it here in case anyone else are looking to implement something similar.
If both the validation loss and the mean average precision at 10 does not improve for patience number of epochs, early stopping is performed.
class CustomEarlyStopping(keras.callbacks.Callback):
def __init__(self, patience=0):
super(CustomEarlyStopping, self).__init__()
self.patience = patience
self.best_weights = None
def on_train_begin(self, logs=None):
# The number of epoch it has waited when loss is no longer minimum.
self.wait = 0
# The epoch the training stops at.
self.stopped_epoch = 0
# Initialize the best as infinity.
self.best_v_loss = np.Inf
self.best_map10 = 0
def on_epoch_end(self, epoch, logs=None):
v_loss=logs.get('val_loss')
map10=logs.get('val_average_precision_at_k10')
# If BOTH the validation loss AND map10 does not improve for 'patience' epochs, stop training early.
if np.less(v_loss, self.best_v_loss) and np.greater(map10, self.best_map10):
self.best_v_loss = v_loss
self.best_map10 = map10
self.wait = 0
# Record the best weights if current results is better (less).
self.best_weights = self.model.get_weights()
else:
self.wait += 1
if self.wait >= self.patience:
self.stopped_epoch = epoch
self.model.stop_training = True
print("Restoring model weights from the end of the best epoch.")
self.model.set_weights(self.best_weights)
def on_train_end(self, logs=None):
if self.stopped_epoch > 0:
print("Epoch %05d: early stopping" % (self.stopped_epoch + 1))
It is then used as:
model.fit(
x_train,
y_train,
batch_size=64,
steps_per_epoch=5,
epochs=30,
verbose=0,
callbacks=[CustomEarlyStopping(patience=10)],
)

You can achieve this by by creating a custom callback. Information on how to do that is located here. Below is some code that illustrates what you can do in a custom callback. The documentation I referenced shows many other options.
class LRA(keras.callbacks.Callback): # subclass the callback class
# create class variables as below. These can be accessed in your code outside the class definition as LRA.my_class_variable, LRA.best_weights
my_class_variable=something # a class variable
best_weights=model.get_weights() # another class variable
# define an initialization function with parameters you want to feed to the callback
def __init__(self, param1, param2, etc):
super(LRA, self).__init__()
self.param1=param1
self.param2=param2
etc for all parameters
# write any initialization code you need here
def on_epoch_end(self, epoch, logs=None): # method runs on the end of each epoch
v_loss=logs.get('val_loss') # example of getting log data at end of epoch the validation loss for this epoch
acc=logs.get('accuracy') # another example of getting log data
LRA.best_weights=model.get_weights() # example of setting class variable value
print(f'Hello epoch {epoch} has just ended') # print a message at the end of every epoch
lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
if v_loss > self.param1:
new_lr=lr * self.param2
tf.keras.backend.set_value(model.optimizer.lr, new_lr) # set the learning rate in the optimizer
# write whatever code you need

I recommend you to create your own callback.
In the following I added a solution that monitors both the accuracy and the loss. You can replace the acc with your own metric:
class CustomCallback(keras.callbacks.Callback):
acc = {}
loss = {}
best_weights = None
def __init__(self, patience=None):
super(CustomCallback, self).__init__()
self.patience = patience
def on_epoch_end(self, epoch, logs=None):
epoch += 1
self.loss[epoch] = logs['loss']
self.acc[epoch] = logs['accuracy']
if self.patience and epoch > self.patience:
# best weight if the current loss is less than epoch-patience loss. Simiarly for acc but when larger
if self.loss[epoch] < self.loss[epoch-self.patience] and self.acc[epoch] > self.acc[epoch-self.patience]:
self.best_weights = self.model.get_weights()
else:
# to stop training
self.model.stop_training = True
# Load the best weights
self.model.set_weights(self.best_weights)
else:
# best weight are the current weights
self.best_weights = self.model.get_weights()
Please bear in mind that if you want to control the minimum change in the monitored quantity (aka. min_delta) you have to integrate it in the code.
Here is the documentation for how to build your custome callback: custom_callback

At this point it would be more simple to make a custom loop and just use if-statements. E.g.:
def main(epochs=50):
for epoch in range(epochs):
fit(epoch)
if test_acc.result() > .8 and topk_acc.result() > .9:
print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
break
if __name__ == '__main__':
main(epochs=100)
Here's a simple custom training loop using this method:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow_datasets as tfds
import tensorflow as tf
data, info = tfds.load('iris', split='train',
as_supervised=True,
shuffle_files=True,
with_info=True)
def preprocessing(inputs, targets):
scaled = tf.divide(inputs, tf.reduce_max(inputs, axis=0))
return scaled, targets
dataset = data.filter(lambda x, y: tf.less_equal(y, 2)).\
map(preprocessing).\
shuffle(info.splits['train'].num_examples)
train_dataset = dataset.take(120).batch(4)
test_dataset = dataset.skip(120).take(30).batch(4)
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
])
loss_object = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
train_loss = tf.metrics.Mean()
test_loss = tf.metrics.Mean()
train_acc = tf.metrics.SparseCategoricalAccuracy()
test_acc = tf.metrics.SparseCategoricalAccuracy()
topk_acc = tf.metrics.SparseTopKCategoricalAccuracy(k=2)
opt = tf.keras.optimizers.Adam(learning_rate=1e-3)
#tf.function
def train_step(inputs, labels):
with tf.GradientTape() as tape:
logits = model(inputs)
loss = loss_object(labels, logits)
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_acc(labels, logits)
#tf.function
def test_step(inputs, labels):
logits = model(inputs)
loss = loss_object(labels, logits)
test_loss.update_state(loss)
test_acc.update_state(labels, logits)
topk_acc.update_state(labels, logits)
def fit(epoch):
template = 'Epoch {:>2} Train Loss {:.3f} Test Loss {:.3f} ' \
'Train Acc {:.2f} Test Acc {:.2f} Test TopK Acc {:.2f} '
train_loss.reset_states()
test_loss.reset_states()
train_acc.reset_states()
test_acc.reset_states()
topk_acc.reset_states()
for X_train, y_train in train_dataset:
train_step(X_train, y_train)
for X_test, y_test in test_dataset:
test_step(X_test, y_test)
print(template.format(
epoch + 1,
train_loss.result(),
test_loss.result(),
train_acc.result(),
test_acc.result(),
topk_acc.result()
))
def main(epochs=50):
for epoch in range(epochs):
fit(epoch)
if test_acc.result() > .8 and topk_acc.result() > .9:
print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
break
if __name__ == '__main__':
main(epochs=100)

what is the pytorch equivalent of a tensorflow linear regression?

I am learning pytorch, that to do a basic linear regression on this data created this way here:
from sklearn.datasets import make_regression
x, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)
y = y.reshape(-1, 1)
print(x.shape, y.shape)
plt.scatter(x, y)
I know that using tensorflow this code can solve:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=1, activation='linear', input_shape=(x.shape[1], )))
model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.05), loss='mse')
hist = model.fit(x, y, epochs=15, verbose=0)
but I need to know what the pytorch equivalent would be like, what I tried to do was this:
# Model Class
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear = nn.Linear(1,1)
def forward(self, x):
x = self.linear(x)
return x
def predict(self, x):
return self.forward(x)
model = Net()
loss_fn = F.mse_loss
opt = torch.optim.SGD(modelo.parameters(), lr=0.05)
# Funcao para treinar
def fit(num_epochs, model, loss_fn, opt, train_dl):
# Repeat for given number of epochs
for epoch in range(num_epochs):
# Train with batches of data
for xb, yb in train_dl:
# 1. Generate predictions
pred = model(xb)
# 2. Calculate Loss
loss = loss_fn(pred, yb)
# 3. Campute gradients
loss.backward()
# 4. Update parameters using gradients
opt.step()
# 5. Reset the gradients to zero
opt.zero_grad()
# Print the progress
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# Training
fit(200, model, loss_fn, opt, data_loader)
But the model doesn't learn anything, I don't know what I can do anymore.
The input/output dimensions is (1/1)

Dataset
First of all, you should define torch.utils.data.Dataset
import torch
from sklearn.datasets import make_regression
class RegressionDataset(torch.utils.data.Dataset):
def __init__(self):
data = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
self.x = torch.from_numpy(data[0]).float()
self.y = torch.from_numpy(data[1]).float()
def __len__(self):
return len(self.x)
def __getitem__(self, index):
return self.x[index], self.y[index]
It converts numpy data to PyTorch's tensor inside __init__ and converts data to float (numpy has double by default while PyTorch's default is float in order to use less memory).
Apart from that it will simply return tuple of features and respective regression targets.
Fit
Almost there, but you have to flatten output from the model (described below). torch.nn.Linear will return tensors of shape (batch, 1) while your targets are of shape (batch,). flatten() will remove unnecessary 1 dimension.
# 2. Calculate Loss
loss = criterion(pred.flatten(), yb)
Model
That is all you need actually:
model = torch.nn.Linear(1, 1)
Any layer can be called directly, no need for forward and inheritance for simple models.
Calling
The rest is almost okay, you just have to create torch.utils.data.DataLoader and pass instance of our dataset. What DataLoader does is it issues __getitem__ of dataset multiple times and creates a batch of specified size (there is some other funny business, but that's the idea):
dataset = RegressionDataset()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
model = torch.nn.Linear(1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=3e-4)
fit(5000, model, criterion, optimizer, dataloader)
Also notice I've used torch.nn.MSELoss(), as we are passing object it looks better than function in this case.
Whole code
To make it easier:
import torch
from sklearn.datasets import make_regression
class RegressionDataset(torch.utils.data.Dataset):
def __init__(self):
data = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
self.x = torch.from_numpy(data[0]).float()
self.y = torch.from_numpy(data[1]).float()
def __len__(self):
return len(self.x)
def __getitem__(self, index):
return self.x[index], self.y[index]
# Funcao para treinar
def fit(num_epochs, model, criterion, optimizer, train_dl):
# Repeat for given number of epochs
for epoch in range(num_epochs):
# Train with batches of data
for xb, yb in train_dl:
# 1. Generate predictions
pred = model(xb)
# 2. Calculate Loss
loss = criterion(pred.flatten(), yb)
# 3. Compute gradients
loss.backward()
# 4. Update parameters using gradients
optimizer.step()
# 5. Reset the gradients to zero
optimizer.zero_grad()
# Print the progress
if (epoch + 1) % 10 == 0:
print(
"Epoch [{}/{}], Loss: {:.4f}".format(epoch + 1, num_epochs, loss.item())
)
dataset = RegressionDataset()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
model = torch.nn.Linear(1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=3e-4)
fit(5000, model, criterion, optimizer, dataloader)
You should get around 0.053 loss or so, vary noise or other params for harder/easier regression task.

How to replace loss function during training tensorflow.keras

I want to replace the loss function related to my neural network during training, this is the network:
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dropout(0.25))
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
model.add(tensorflow.keras.layers.Dropout(0.5))
model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))
so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:
model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training
but I have this error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Why? How can I fix it? There is another way to change loss function?
Thanks

I'm currently working on google colab with Tensorflow and Keras and i was not able to recompile a model mantaining the weights, every time i recompile a model like this:
with strategy.scope():
model = hd_unet_model(INPUT_SIZE)
model.compile(optimizer=Adam(lr=0.01),
loss=tf.keras.losses.MeanSquaredError() ,
metrics=[tf.keras.metrics.MeanSquaredError()])
the weights gets resetted.
so i found an other solution, all you need to do is:
Get the model with the weights you want ( load it or something else )
gets the weights of the model like this:
weights = model.get_weights()
recompile the model ( to change the loss function )
set again the weights of the recompiled model like this:
model.set_weights(weights)
launch the training
i tested this method and it seems to work.
so to change the loss mid-Training you can:
Compile with the first loss.
Train of the first loss.
Save the weights.
Recompile with the second loss.
Load the weights.
Train on the second loss.

So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.
Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.
Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.
Solution:
My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).
The dataset can be found here Boston housing dataset
# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
batch_size = 25
# have to define manually a dict to store all epochs scores
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []
# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# Some plots to check what is going on
# loss function
pyplot.subplot(311)
pyplot.title('Loss')
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
pyplot.legend()
# Only mean squared error
pyplot.subplot(312)
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
pyplot.legend()
# Only mean squared logarithmic error
pyplot.subplot(313)
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
pyplot.legend()
plt.tight_layout()
pyplot.show()
The resulting plot confirm that the loss function is changing after the second epoch:
The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:
print(history['history']['loss'])
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
print(history['history']['mean_squared_error'])
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
print(history['history']['mean_squared_logarithmic_error'])
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.

tf.keras.Sequential binary classification model predicting [0.5, 0.5] or close to

I am currently trying to build a model to classify whether or not the outcome of a given football match will be above or below 2.5 goals, based on the Home team, Away team & game league, using a tf.keras.Sequential model in TensorFlow 2.0RC.
The problem I am encountering is that my softmax results converge on [0.5,0.5] when using the model.predict method. What makes this odd is that my validation & test accuracy and losses are about 0.94 & 0.12 respectively after 1000 epochs of training, otherwise I would have put this down to an overfitting problem. I am aware that 1000 epochs is extremely likely to overfit, however, I want to understand why my accuracy increases until about 800 epochs in. My loss flattens at about 300 epochs.
I have tried to alter the number of layers, number of units in each layer, the activation functions, optimizers and loss functions, number of epochs and learning rates, but can only seem to increase the losses.
The results still seem to converge toward [0.5,0.5] regardless.
The full code can be viewed at https://github.com/AhmUgEk/tensorflow_football_predictions, but below is an extract showing model composition.
# Create Keras Sequential model:
model = keras.Sequential()
model.add(feature_layer) # Input processing layer.
model.add(Dense(units=32, activation='relu')) # Hidden Layer 1.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=32, activation='relu')) # Hidden Layer 2.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=2, activation='softmax')) # Output layer.
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Fit the model to the training dataset and validate against the
validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=1000,
callbacks=[tensorboard_callback]
)
I would expect to receive a result of [0.282, 0.718] for example for an input of:
model.predict_classes([np.array(['E0'], dtype='object'),
np.array(['Liverpool'], dtype='object'),
np.array(['Newcastle'], dtype='object')])[0]
but as per the above, receive a result of say [0.5, 0.5].
Am I missing something obvious here?

I had made some minor changes in the model. Now, I am not getting exactly [0.5, 0.5].
Result:
[[0.61482537 0.3851746 ]
[0.5121426 0.48785746]
[0.48058605 0.51941395]
[0.48913187 0.51086813]
[0.45480043 0.5451996 ]
[0.48933673 0.5106633 ]
[0.43431875 0.5656812 ]
[0.55314165 0.4468583 ]
[0.5365097 0.4634903 ]
[0.54371756 0.45628244]]
Implementation:
import datetime
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from gpu_limiter import limit_gpu
from pipe_functions import csv_to_df, dataframe_to_dataset
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.layers import BatchNormalization, Dense, DenseFeatures, Dropout, Input
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint
import tensorflow.keras.backend as K
from tensorflow.data import Dataset
# Test GPU availability and instantiate memory growth limitation if True:
if tf.test.is_gpu_available():
print('GPU Available\n')
limit_gpu()
else:
print('Running on CPU')
df = csv_to_df("./csv_files")
# Format & organise imported data, making the "Date" column the new index:
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Div', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG']].dropna().set_index('Date').sort_index()
df['Over_2.5'] = (df['FTHG'] + df['FTAG'] > 2.5).astype(int)
df = df.drop(['FTHG', 'FTAG'], axis=1)
# Split data into training, validation and testing data:
# Note: random_state variable set to ensure reproducibility.
train, test = train_test_split(df, test_size=0.05, random_state=42)
train, val = train_test_split(train, test_size=0.05, random_state=42)
# print(df['Over_2.5'].value_counts()) # Check that data is balanced.
# Create datasets from train, val & test dataframes:
target_col = 'Over_2.5'
batch_size = 32
def df_to_dataset(features: np.ndarray, labels: np.ndarray, shuffle=True, batch_size=8) -> Dataset:
ds = Dataset.from_tensor_slices(({"feature": features}, {"target": labels}))
if shuffle:
ds = ds.shuffle(buffer_size=len(features))
ds = ds.batch(batch_size)
return ds
def get_feature_transform() -> DenseFeatures:
# Format features into feature columns to ensure data is in the correct format for feeding into the model:
feature_cols = []
for column in filter(lambda x: x != target_col, df.columns):
feature_cols.append(tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(
key=column, vocabulary_list=df[column].unique()), dimension=5))
return DenseFeatures(feature_cols)
# Transforms all features into dense tensors.
feature_transform = get_feature_transform()
train_features = feature_transform(dict(train)).numpy()
val_features = feature_transform(dict(val)).numpy()
test_features = feature_transform(dict(test)).numpy()
train_dataset = df_to_dataset(train_features, train[target_col].values, shuffle=True, batch_size=batch_size)
val_dataset = df_to_dataset(val_features, val[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to validation data.
test_dataset = df_to_dataset(test_features, test[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to test data.
# Create Keras Functional API:
# Create a feature layer from the feature columns, to be placed at the input layer of the model:
def build_model(input_shape: tuple) -> keras.Model:
input_layer = keras.Input(shape=input_shape, name='feature')
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense0')(input_layer) # Hidden Layer 1.
model = BatchNormalization(name='bc0')(model)
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense1')(model) # Hidden Layer 2.
model = Dropout(rate=0.1)(model)
model = BatchNormalization(name='bc1')(model)
model = Dense(units=100, activation='relu', kernel_initializer='normal', name='dense2')(model) # Hidden Layer 3.
model = Dropout(rate=0.25)(model)
model = BatchNormalization(name='bc2')(model)
model = Dense(units=50, activation='relu', kernel_initializer='normal', name='dense3')(model) # Hidden Layer 4.
model = Dropout(rate=0.4)(model)
model = BatchNormalization(name='bc3')(model)
output_layer = Dense(units=2, activation='softmax', kernel_initializer='normal', name='target')(model) # Output layer.
model = keras.Model(inputs=input_layer, outputs=output_layer, name='better-than-chance')
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['accuracy']
)
return model
# # Create a TensorBoard log file (time appended) directory for every run of the model:
# directory = ".\\logs\\" + str(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# os.mkdir(directory)
# # Create a TensorBoard callback to log a record of model performance for every 1 epoch:
# tensorboard_callback = TensorBoard(log_dir=directory, histogram_freq=1, write_graph=True, write_images=True)
# Run "tensorboard --logdir .\logs" in anaconda prompt to review & compare logged results.
# Note: Make sure that the correct environment is activated before running.
model = build_model((train_features.shape[1],))
model.summary()
# checkpoint = ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto')
# Fit the model to the training dataset and validate against the validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10)
# callbacks=[checkpoint]
# Saves and reloads model.
# model.save("./model.h5")
# model_from_saved = keras.models.load_model("./model.h5")
# Evaluate model accuracy against test dataset:
# scores, accuracy = model.evaluate(train_dataset)
# print('Accuracy:', accuracy)
##############
## OPTIONAL ##
##############
# DUBUGGING
# inp = model.input # input placeholder
# outputs = [layer.output for layer in model.layers] # all layer outputs
# functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# # Testing
# layer_outs = [func([test_features]) for func in functors]
# print(layer_outs)
# # # Form a prediction based on inputs:
prediction = model.predict({"feature": test_features[:10]})
print(prediction)
One thing you can do is to try some ensemble Learning methods like
RandomForest
and
XGBoost
and compare the results.
You should try is to add other Key Performance Indicators(KPI)s in
your data and then try to fit the model.

Validation loss not moving with MLP in Regression

Given input features as such, just raw numbers:
tensor([0.2153, 0.2190, 0.0685, 0.2127, 0.2145, 0.1260, 0.1480, 0.1483, 0.1489,
0.1400, 0.1906, 0.1876, 0.1900, 0.1925, 0.0149, 0.1857, 0.1871, 0.2715,
0.1887, 0.1804, 0.1656, 0.1665, 0.1137, 0.1668, 0.1168, 0.0278, 0.1170,
0.1189, 0.1163, 0.2337, 0.2319, 0.2315, 0.2325, 0.0519, 0.0594, 0.0603,
0.0586, 0.0067, 0.0624, 0.2691, 0.0617, 0.2790, 0.2805, 0.2848, 0.2454,
0.1268, 0.2483, 0.2454, 0.2475], device='cuda:0')
And the expected output is a single real number output, e.g.
tensor(-34.8500, device='cuda:0')
Full code on https://www.kaggle.com/alvations/pytorch-mlp-regression
I've tried creating a simple 2 layer network with:
class MLP(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super(MLP, self).__init__()
self.linear = nn.Linear(input_size, hidden_size)
self.classifier = nn.Linear(hidden_size, output_size)
def forward(self, inputs, hidden=None, dropout=0.5):
inputs = F.dropout(inputs, dropout) # Drop-in.
# First Layer.
output = F.relu(self.linear(inputs))
# Matrix manipulation magic.
batch_size, sequence_len, hidden_size = output.shape
# Technically, linear layer takes a 2-D matrix as input, so more manipulation...
output = output.contiguous().view(batch_size * sequence_len, hidden_size)
# Apply dropout.
output = F.dropout(output, dropout)
# Put it through the classifier
# And reshape it to [batch_size x sequence_len x vocab_size]
output = self.classifier(output).view(batch_size, sequence_len, -1)
return output
And training as such:
# Training routine.
def train(num_epochs, dataloader, valid_dataset, model, criterion, optimizer):
losses = []
valid_losses = []
learning_rates = []
plt.ion()
x_valid, y_valid = valid_dataset
for _e in range(num_epochs):
for batch in tqdm(dataloader):
# Zero gradient.
optimizer.zero_grad()
#print(batch)
this_x = torch.tensor(batch['x'].view(len(batch['x']), 1, -1)).to(device)
this_y = torch.tensor(batch['y'].view(len(batch['y']), 1, 1)).to(device)
# Feed forward.
output = model(this_x)
prediction, _ = torch.max(output, dim=1)
loss = criterion(prediction, this_y.view(len(batch['y']), -1))
loss.backward()
optimizer.step()
losses.append(torch.sqrt(loss.float()).data)
with torch.no_grad():
# Zero gradient.
optimizer.zero_grad()
output = model(x_valid.view(len(x_valid), 1, -1))
prediction, _ = torch.max(output, dim=1)
loss = criterion(prediction, y_valid.view(len(y_valid), -1))
valid_losses.append(torch.sqrt(loss.float()).data)
clear_output(wait=True)
plt.plot(losses, label='Train')
plt.plot(valid_losses, label='Valid')
plt.legend()
plt.pause(0.05)
Tuning several hyperparameters, it looks like the model doesn't train well, the validation loss doesn't move at all e.g.
hyperparams = Hyperparams(input_size=train_dataset.x.shape[1],
output_size=1,
hidden_size=150,
loss_func=nn.MSELoss,
learning_rate=1e-8,
optimizer=optim.Adam,
batch_size=500)
And it's loss curve:
Any idea what's wrong with the network?
Am I training the regression model with the wrong loss? Or I've just not yet found the right hyperparameters?
Or am I validating the model wrongly?

From the code you provided, it is tough to say why the validation loss is constant but I see several problems in your code.
Why do you validate for each training mini-batch? Instead, you should validate your model after you do the training for one complete epoch (iterating over your full dataset once). So, the skeleton should be like:
for _e in range(num_epochs):
for batch in tqdm(train_dataloader):
# training code
with torch.no_grad():
for batch in tqdm(valid_dataloader):
# validation code
# plot your loss values
Also, you can plot after each epoch, not after each mini-batch training.
Did you check whether the model parameters are getting updated after optimizer.step() during training? How many validation examples do you have? Why don't you use mini-batch computation during validation?
Why do you do: optimizer.zero_grad() during validation? It doesn't make sense because, during validation, you are not going to do anything related to optimization.
You should use model.eval() during validation to turn off the dropouts. See PyTorch documentation to learn about .train() and .eval() methods.
The learning rate is set to 1e-8, isn't it too small? Why don't you use the default learning rate for Adam (1e-3)?
The following requires some reasoning.
Why are you using such a large batch size? What is your training dataset size?
You can directly plot the MSELoss, instead of taking the square root.
My suggestion would be: use some existing resources for MLP in PyTorch. Don't do it from scratch if you do not have sufficient knowledge at this point. It would make you suffer a lot.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.