I am training a binary classifier model that classifies between disease and non-disease.
When I run the model, training loss decreased and auc, acc, get increased.
But, after certain epoch train loss increased and auc, acc were decreased.
I don't know why training performance got decreased after certain epoch.
I used general 1d cnn model and methods, details here:
I tried already to:
batch shuffle
introduce class weights
loss change (binary_crossentropy > BinaryFocalLoss)
learning_rate change
Two questions for you going forward.
Does the training and validation accuracy keep dropping - when you would just let it run for let's say 100 epochs? Definitely something I would try.
Which optimizer are you using? SGD? ADAM?
How large is your dropout, maybe this value is too large. Try without and check whether the behavior is still the same.
It might also be the optimizer
As you do not seem to augment (this could be a potential issue if you do by accident break some label affiliation) your data, each epoch should see similar gradients. Thus I guess, at this point in your optimization process, the learning rate and thus the update step is not adjusted properly - hence not allowing to further progress into that local optimum, and rather overstepping the minimum while at the same time decreasing training and validation performance.
This is an intuitive explanation and the next things I would try are:
Scheduling the learning rate
Using a more sophisticated optimizer (starting with ADAM if you are not already using it)
Your model is overfitting. This is why your accuracy increases and then begins decreasing. You need to implement Early Stopping to stop at the Epoch with the best results. You should also implement dropout layers.
I am building ANN as below:-
I am adding different features into the model and checking whether the newly added feature decreases or increases the accuracy but the problem is that each time I run this code with the same values I get different accuracy, sometimes it gets as low as 0.50 and so, I have few doubts and kindly answer them:-
Is the model giving different accuracy each time because in dropout reg. there are random dropouts in nodes and each time I run diff. nodes get silenced so thereby giving different accuracies i.e sometimes low and sometimes high?
How can I trust the accuracy of the model if each time it gives different accuracies? How can I know that the feature I have added has resulted in a decrement or increment of the accuracy?
If I get high accuracy and wanted to reproduce these results how do I save the parameters that the model has used?
Great questions. Answers:
I think your theory is right; it's the dropout. That's the only layer with an element of randomness each run, so it's likely the culprit. Try removing that layer, leaving everything else fixed, and run multiple times. Check if the accuracy is the same.
Cross validation. This article explains how it works, but the gist is that it is a statistical technique that trains and checks the accuracy of multiple runs of your model, all with different slices of data. The average accuracy of all runs is used. So highs and lows will be averaged to a true(ish) accuracy. That being said, if your model has inconsistent results by just varying dropout, it's an indicator that when you move the model to production and use real data, it will perform poorly.
Keras api has a method model.save("model_name") to save models. You can use keras.models.load_models("model_name") to get it back. As I said in point 2 though; if your model is so finicky that some trainings drastically affect accuracy, then even if you train and get good accuracy, it probably won't be useful on new data. So when you say "If I get high accuracy and wanted to reproduce these results", really you shouldn't be thinking along these lines. Instead, try to get consistently high training accuracy.
I have a problem, when training a U-Net, which has many similarities with a CNN, in Keras with Tensorflow. When starting the Training, the Accuracy increases and the loss steadily goes down. At around epoch 40, in my example, the validation loss jumps to the maximum and the validation accuracy to zero. What can I do, to prevent that from happening. I am using a similar approach to this one, for my code, in Keras.
Example image of the Loss
I already tried changing Learning rate, adding dropout and changing optimzers, those will not change the curve for the better. As i have a big training set, it is very unlikely, that I am encountering overfitting.
I am training a model that does image captioning. I noticed that my model get a very high training accuracy in the first epoch ( around 89%)as well as the validation accuracy. Actually the training accuracy starts in a very high point from the beginning of the first epoch, it starts around %60 and goes up to %80 very fast. That does not make sense to me because the model learns very fast with a very high accuracy in the beginning.
Here is a screenshot of the output
If you are using mini-batches during fitting, you can watch the accuracy and loss change during each iteration. Your first few mini-batches will probably be terrible and then jump up around half-way through if the optimizer has found a reasonable local minimum. I've had this happen a lot; it depends a lot on which optimizer I use, the size of the model, and the amount of data. On its own, it isn't necessarily a bad thing. But be sure to check for over-fitting with a test set.
Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?
A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.
I have a question concerning tuning hyperparameters for the Inception ResNet V2 model (or any other DL model), which I can't really wrap my head around.
Right now, I have certain set certain hyperparameters, such as learning_rate, decay_factor and decay_after_nr_epochs. My model saves checkpoints, so it can continue at these points later on.
If I run the model again, with more epochs, it logically continues at the last checkpoint to continue training.
However, if I would set new hyperparameters, such as learning_rate = 0.0001 instead of learning_rate = 0.0002, does it make sense to continue on the checkpoints, or is it better to use new hyperparameters on the initial model?
The latter sounds more logical to me, but I'm not sure whether this is necessary.
Thanks in advance.
Both the methods are okay but you have to see your training loss after adjusting them. If they are converging in both the cases then it's fine otherwise adjust accordingly.
However, people adopt these two methods as far as I know 1. Keep a higher learning rate initially and keep a decay factor, thus reducing your learning rate slowly as it starts converging. 2. You can keep an eye on loss function and do early stopping if you think you can adjust to better learning rate.