Why does more epochs make my model worse?

Why does more epochs make my model worse? - python

Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?

A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.

Related

Why does train data performance deteriorate dramatically?

I am training a binary classifier model that classifies between disease and non-disease.
When I run the model, training loss decreased and auc, acc, get increased.
But, after certain epoch train loss increased and auc, acc were decreased.
I don't know why training performance got decreased after certain epoch.
I used general 1d cnn model and methods, details here:
I tried already to:
batch shuffle
introduce class weights
loss change (binary_crossentropy > BinaryFocalLoss)
learning_rate change

Two questions for you going forward.
Does the training and validation accuracy keep dropping - when you would just let it run for let's say 100 epochs? Definitely something I would try.
Which optimizer are you using? SGD? ADAM?
How large is your dropout, maybe this value is too large. Try without and check whether the behavior is still the same.
It might also be the optimizer
As you do not seem to augment (this could be a potential issue if you do by accident break some label affiliation) your data, each epoch should see similar gradients. Thus I guess, at this point in your optimization process, the learning rate and thus the update step is not adjusted properly - hence not allowing to further progress into that local optimum, and rather overstepping the minimum while at the same time decreasing training and validation performance.
This is an intuitive explanation and the next things I would try are:
Scheduling the learning rate
Using a more sophisticated optimizer (starting with ADAM if you are not already using it)

Your model is overfitting. This is why your accuracy increases and then begins decreasing. You need to implement Early Stopping to stop at the Epoch with the best results. You should also implement dropout layers.

In tensorflow why for a same dropout value of 0.8 when run with adam optimiser with 50epochs give different accuracy each time i run it?

I am building ANN as below:-
model=Sequential()
model.add(Flatten(input_shape=(25,)))
model.add(Dense(25,activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(16,activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(5,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(xtraindata,ytraindata,epochs=50)
test_loss,test_acc=model.evaluate(xtestdata,ytestdata)
print(test_acc)
I am adding different features into the model and checking whether the newly added feature decreases or increases the accuracy but the problem is that each time I run this code with the same values I get different accuracy, sometimes it gets as low as 0.50 and so, I have few doubts and kindly answer them:-
Is the model giving different accuracy each time because in dropout reg. there are random dropouts in nodes and each time I run diff. nodes get silenced so thereby giving different accuracies i.e sometimes low and sometimes high?
How can I trust the accuracy of the model if each time it gives different accuracies? How can I know that the feature I have added has resulted in a decrement or increment of the accuracy?
If I get high accuracy and wanted to reproduce these results how do I save the parameters that the model has used?

Great questions. Answers:
I think your theory is right; it's the dropout. That's the only layer with an element of randomness each run, so it's likely the culprit. Try removing that layer, leaving everything else fixed, and run multiple times. Check if the accuracy is the same.
Cross validation. This article explains how it works, but the gist is that it is a statistical technique that trains and checks the accuracy of multiple runs of your model, all with different slices of data. The average accuracy of all runs is used. So highs and lows will be averaged to a true(ish) accuracy. That being said, if your model has inconsistent results by just varying dropout, it's an indicator that when you move the model to production and use real data, it will perform poorly.
Keras api has a method model.save("model_name") to save models. You can use keras.models.load_models("model_name") to get it back. As I said in point 2 though; if your model is so finicky that some trainings drastically affect accuracy, then even if you train and get good accuracy, it probably won't be useful on new data. So when you say "If I get high accuracy and wanted to reproduce these results", really you shouldn't be thinking along these lines. Instead, try to get consistently high training accuracy.

Training Loss Improving but Validation Converges Early

I am creating a CNN using TensorFlow and when training, I find that the training dataset is still improving (i.e. loss still decreasing), while the test/validation dataset has converged and is no longer improving. (Learning Curve Plot attached below)
Does anyone know why this might be the case and how could I possibly fix it, to have the validation loss reduce along with the training? Would be greatly appreciated!
Plot of my models learning curve:

The plot of losses is very typical. Your model appears to be performing very well with very low MSE losss. At this point you have essentially reached the limits of your models performance. One thing which may help is to use an adjustable learning rate. The Keras callback ReduceLROnPlateau can be setup to monitor the validation loss. If the validation loss fails to decrease for a 'patience' number of epochs the learning rate will be reduced by a factor "factor" where factor is a number less than 1. Documentation is here.
You may also want to use the Keras EarlyStopping callback. This callback can be set to monitor validation loss and halt training if it fails to decrease for "patience" number of epochs. If you set restore_best_weights=True it will leave your model with the weights used in the epoch with the lowest validation loss. This will prevent your model from returning an over fit model. My recommended code is shown below
rlronp=f.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1)
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=3,restore_best_weights=True)
callbacks=[rlronp, estop]
In model.fit include callbacks=callbacks. I suspect neither of the above will provide much improvement. You will probably have to try some changes to your model as well. Adding a Dropout layer may help to some degree to reduce over-fitting as would including regularization. Documentation for that is here.. Of course the standard approach of getting a larger data set may also help but is not always easy to achieve. If you are working with images you could try image augmentation using say the Keras ImageDataGenerator or Tensorflow Image Augmentation layers. Documentation for that is here.. One thing I found which helps for the case of images is to crop your images to just the Region of Interest (ROI). For example if you were doing face recognition cropping the images to just be of the face will help significantly.

this means you're hitting your architecture's limit, training loss will keep decreasing (this is known as overfitting), which will eventually INCREASE validation loss, make changes to the parameters or consider altering your layers (adding, removing, etc.), maybe even look into ways you could alter the dataset.
when this happened to me a while ago I added a LSTM layer to my CNN architecture and also incorporated K-means validation, this is not a walkthrough, you need to figure this out for your specific problem, good luck.

What does a sudden increase in accuracy during epoch training show about my model?

I am learning Convolution Neural Network now and practicing it on kaggle digit recognizer (MNIST) dataset.
While training the data, I noticed that inspite of initial gradually growing accuracy, in between there was a huge jump i.e from 0.8984 to 0.9814.
As a beginner, I want to investigate what does this jump really show about my model. Here is the image of the epochs:
enter image description here
I have circled the jump in yellow. Thanks in advance!

As the loss gradually starts to decrease, this create an impact on fitting of the model. The cost function makes the loss go down, which directly creates an impact on the fitting of model. Better the fitting of model into training data, better the accuracy (which we can easily see as the accuracy increases with the reduction in loss). There is almost a difference of 0.08 in your consecutive loss function which is enough for the model to fit more from the current state.
Now as the model progresses, we try it on the testing dataset because the real world data is nothing like the data we trained it on.
However, a higher accuracy might not always be good as the model is considered to be over-evaluated which is also known as overfitting which means the model is performing too well that it can't handle any little changes. Therefore, a correct balance between learning rate and epochs are required in order to predict the classes correctly. It also depends on the architecture, Optimizing function which make sure the oscillations are low and numerous other things.

training loss decreases while dev loss increases

I'm observing the following patterns in a one-layer CNN, binary classification model:
Training loss decreases while dev loss increased with number of steps
Training accuracy increases while dev accuracy decreases with number of steps
Based on past SO questions and literature review, it seems that these patterns are indicative of over-fitting (the model performs well in training, but cannot generalize to new examples).
The graphs below illustrate the loss and accuracy with respect to the number of steps in training.
In both,
The orange line represents the summary of the dev set performance.
The blue line represents the summary of the training set performance.
Loss:
Accuracy:
Traditional remedies I've considered, and my observations about them:
Adding L2 Regularization : I've tried many coefficients of L2 regularization -- from 0.0 to 4.5; all of these tests yield a similar pattern by the 5,000th step in both loss and accuracy.
Cross validation : It seems that the role of cross-validation is widely mis-understood online. As this answer states, cross-validation is for model checking, not model building. Indeed, cross-validation would be a way to check if the model generalizes well. And actually, the graphs I show are from one fold of a 4-fold cross-validation. If I observe a similar pattern in the loss/accuracy in all the folds, what other insight does cross-validation offer other than the confirmation that the model does not generalize well?
Early stopping : This would seem the most intuitive, but the loss graph seems to indicate that the loss levels out only after a divergence in the dev set loss is observed; the starting point of this early stop, then, doesn't seem easy to decide.
Data : The amount of labeled data I have available is limited, so training on more data is not an option right now.
All this said, what I am asking is:
If the patterns observed in the loss and accuracy are indeed indicative of over-fitting, are there any other methods to counteract over-fitting that I haven't considered?
If these patterns are not indicative of over-fitting, what else could they mean?
Thanks -- any insight would be much appreciated.

I think that you are totally on the right track. Looks like classic over-fitting.
One option is adding dropout if you don't already have it. It falls into the category of regularization, but it is more commonly used now then L1 and L2 regularization.
Changing the model architcture could get better results but it's hard to say what specifically would be best. It could help to make it deeper with more layers and possibly some pooling layers. It will likely still overfit but you might get a higher accuracy on the dev set before that happens.
Getting more data may be one of the best things you could do. If you can't get more data you can try to augment the data. You can also try cleaning the data to remove noise which can help prevent the model from fitting to noise.
You may ultimately want to try setting up a hyperparameter optimization search. This, however, can take a while on neural nets which take a while to train. Make sure you remove a test set before hyper parameter tuning.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.