Training Loss Improving but Validation Converges Early

Training Loss Improving but Validation Converges Early - python

I am creating a CNN using TensorFlow and when training, I find that the training dataset is still improving (i.e. loss still decreasing), while the test/validation dataset has converged and is no longer improving. (Learning Curve Plot attached below)
Does anyone know why this might be the case and how could I possibly fix it, to have the validation loss reduce along with the training? Would be greatly appreciated!
Plot of my models learning curve:

The plot of losses is very typical. Your model appears to be performing very well with very low MSE losss. At this point you have essentially reached the limits of your models performance. One thing which may help is to use an adjustable learning rate. The Keras callback ReduceLROnPlateau can be setup to monitor the validation loss. If the validation loss fails to decrease for a 'patience' number of epochs the learning rate will be reduced by a factor "factor" where factor is a number less than 1. Documentation is here.
You may also want to use the Keras EarlyStopping callback. This callback can be set to monitor validation loss and halt training if it fails to decrease for "patience" number of epochs. If you set restore_best_weights=True it will leave your model with the weights used in the epoch with the lowest validation loss. This will prevent your model from returning an over fit model. My recommended code is shown below
rlronp=f.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1)
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=3,restore_best_weights=True)
callbacks=[rlronp, estop]
In model.fit include callbacks=callbacks. I suspect neither of the above will provide much improvement. You will probably have to try some changes to your model as well. Adding a Dropout layer may help to some degree to reduce over-fitting as would including regularization. Documentation for that is here.. Of course the standard approach of getting a larger data set may also help but is not always easy to achieve. If you are working with images you could try image augmentation using say the Keras ImageDataGenerator or Tensorflow Image Augmentation layers. Documentation for that is here.. One thing I found which helps for the case of images is to crop your images to just the Region of Interest (ROI). For example if you were doing face recognition cropping the images to just be of the face will help significantly.

this means you're hitting your architecture's limit, training loss will keep decreasing (this is known as overfitting), which will eventually INCREASE validation loss, make changes to the parameters or consider altering your layers (adding, removing, etc.), maybe even look into ways you could alter the dataset.
when this happened to me a while ago I added a LSTM layer to my CNN architecture and also incorporated K-means validation, this is not a walkthrough, you need to figure this out for your specific problem, good luck.

Related

Why does train data performance deteriorate dramatically?

I am training a binary classifier model that classifies between disease and non-disease.
When I run the model, training loss decreased and auc, acc, get increased.
But, after certain epoch train loss increased and auc, acc were decreased.
I don't know why training performance got decreased after certain epoch.
I used general 1d cnn model and methods, details here:
I tried already to:
batch shuffle
introduce class weights
loss change (binary_crossentropy > BinaryFocalLoss)
learning_rate change

Two questions for you going forward.
Does the training and validation accuracy keep dropping - when you would just let it run for let's say 100 epochs? Definitely something I would try.
Which optimizer are you using? SGD? ADAM?
How large is your dropout, maybe this value is too large. Try without and check whether the behavior is still the same.
It might also be the optimizer
As you do not seem to augment (this could be a potential issue if you do by accident break some label affiliation) your data, each epoch should see similar gradients. Thus I guess, at this point in your optimization process, the learning rate and thus the update step is not adjusted properly - hence not allowing to further progress into that local optimum, and rather overstepping the minimum while at the same time decreasing training and validation performance.
This is an intuitive explanation and the next things I would try are:
Scheduling the learning rate
Using a more sophisticated optimizer (starting with ADAM if you are not already using it)

Your model is overfitting. This is why your accuracy increases and then begins decreasing. You need to implement Early Stopping to stop at the Epoch with the best results. You should also implement dropout layers.

What does a sudden increase in accuracy during epoch training show about my model?

I am learning Convolution Neural Network now and practicing it on kaggle digit recognizer (MNIST) dataset.
While training the data, I noticed that inspite of initial gradually growing accuracy, in between there was a huge jump i.e from 0.8984 to 0.9814.
As a beginner, I want to investigate what does this jump really show about my model. Here is the image of the epochs:
enter image description here
I have circled the jump in yellow. Thanks in advance!

As the loss gradually starts to decrease, this create an impact on fitting of the model. The cost function makes the loss go down, which directly creates an impact on the fitting of model. Better the fitting of model into training data, better the accuracy (which we can easily see as the accuracy increases with the reduction in loss). There is almost a difference of 0.08 in your consecutive loss function which is enough for the model to fit more from the current state.
Now as the model progresses, we try it on the testing dataset because the real world data is nothing like the data we trained it on.
However, a higher accuracy might not always be good as the model is considered to be over-evaluated which is also known as overfitting which means the model is performing too well that it can't handle any little changes. Therefore, a correct balance between learning rate and epochs are required in order to predict the classes correctly. It also depends on the architecture, Optimizing function which make sure the oscillations are low and numerous other things.

why the validation accuracy does not increase in a normal way over the epochs?

I'm trying to transfer learning VGG16 model with imagenet in a dataset of retinal images but i'm confused to get a graph like this, I don't know why the validation accuracy didn't increase in a normal way over the epochs, like training accuracy did, is it an index of overfitting ? if yes, how can i overcome it ?

My first suggestion would be to use a ResNet (network that contains residual connections) as a first step towards the improvement, in order to avoid the vanishing gradient problem.
VGGs have become less used are no longer relevant for benchmarking. What you should use instead is a ResNet50, which is available in tensorflow.keras.applications alongside other relevant pre-trained neural networks.
Also, the validation accuracy fluctuates very much; in addition to the previously mentioned possible improvement, you may want to recheck the construction of your training and validation sets.

Strategies for solving overfitting - other options?

I am building a predictive model where I want to know can I predict whether a package will be delivered on time (Binary Yes / No), in the event that the package is not delivered on time, I wish to be able to predict by when it will be delivered in categories of <7days, <14days, <21days >28days after expected date.
I have built and tested a model for binary classification and have got an f Score of 0.92, which is satisfactory for my needs. However, when I train my categorical model, I start to see training accuracy and validation accuracy diverge (training accuracy is much better than validation accuracy). This is a sign of overfitting.
However, I have tried regularization and different values, plus using dropout and different values, and the validation accuracy never gets above 0.7. My total training set is of ~10k examples, ~3k validation, and whilst the catgorical spread is not equal there are sufficient examples of each category (I think). I am using a NN and have increased / decreased both layers and activations and still no joy
Any thoughts on where to go next. Thanks

Because you are using NN, introduce dropout layers. See if it can help to reduce the overfitting problem. And also checkout this How to choose the number of hidden layers and nodes in a feedforward neural network?
The more complex the network (hidden layers, number of neurons in them), also contribute to overfitting problem

The approach we have chosen is to carry out a linear regression with the expected duration as target variable. We have excluded some outliers, and then taken the differences between the actual and predicted days. We then max'd and min'd the difference and we now have a prediction with a tolerable range. We will keep working on the other techniques to see if we can improve. Thanks to everyone who suggested ideas

Why does more epochs make my model worse?

Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?

A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.