May I know should I stop the training process at epoch 19, since the overfitting happens right after that.
Or it is still ok to be used since the difference is not too big.
I would like to ask for clarification regarding how small is the difference should we stop the training process.
Thank you.
Graph of "Loss vs Epoch" and "Acc vs Epoch"
i guess, i'd like to know more about your problem, your network architecture and so on, but given what you provided, i would test first with different batch sizes before making any decision about stopping at epoch 19. Take a look at this.
I think you should study early stopping; this article can help you learn more about Early Stopping to avoid overfitting. It can help you not worry about the number of epochs and the value of the loss.
Related
Will probably spelunk more into the Catboost implementation to figure out what's going on, but wanted to check in with the SO community first to make sure I'm not doing anything stupid before I waste my time. I was trying to test out Catboost's early_stopping rounds to speed up parameter search, and was surprised to see that even when I raised up learning rates to stupidly high values the model still fit through all iterations!
Asking for just a quick confirmation that my code looks ok/if anyone's had a similar experience working with Catboost. I've confirmed here that the loss value here changes sporadically as expected, but the fitting continues for the 10 iterations.
Probably the fastest self-answer, but leaving it up here since this wasn't clear to me. Early stopping rounds ONLY works if you have eval_set specified. The more you know.
My val-accuracy is far lower than the training accuracy. What might be the reasons for this?
Thank you.
Seems your problem is all about overfitting. To understand what are the causes behind overfitting problem, first is to understand what is overfitting.
One resource is in the link about overfitting.
To eliminate this issue, there are several things you should check. Especially for your model:
1) Are you using dropout? Check dropout.
2) Are you using regularization? Check regularization.
Note: These two are one of the two important things to utilize. This list may be a lot longer if you dig deeper.
Furthermore, there may be some problems in your dataset. For example:
Your test-train split may be not suitable for your case.
Your dataset may be too small to train a network. Maybe you should generate or collect more data.(Eg: if you're classifying images, you can flip the images or use some augmentation techniques to artificially increase the size of your dataset.
My overall suggestion is to understand What are the main reasons causing overfitting in machine learning? since the given answers are so limited.
I think overfitting problem, try to generalize your model more by Regulating and using Dropout layers on. You can do another task, maybe there are periodic variation of your inputted datasets, so try to shuffle on your both train and text datasets. I think the problem will solve. Thank you
Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?
A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.
I am learning about VGG, and I was struck by the following performance graph:
My question is this: Looking at the graph, it seems that first there is rapid growth, which then gradually slows down. This makes sense to me, since it becomes more difficult to improve a model the smaller the loss is. However, there are also three sudden drops around the 50, 75, and 100 epoch mark. I am curious as to why all the models experience this drop and rebound at the same time? What is causing it?
Thank you in advance for any help.
This is a common observation in complex model training. For instance, classic CNNs exhibit this behaviour: AlexNet and GoogleNet have two drop-and-improve irruptions in their training patterns. This is a very complex and organic effect of the model's holistic learning characteristics.
To oversimplify ... there are learning bottlenecks inherent in most models, even when the topology appears to be smooth. The model learns for a while until the latter layers have adapted well during back-prop ... until that learning bumps into one of the bottlenecks, some interference in input drive and feedback that tends to stall further real progress in training the earlier layers. This indicates a few false assumptions in the learning of those lower layers, assumptions which now encounter some statistical realities in the upper layers.
The natural operation of the training process forces some early-layer chaos back into the somewhat-stable late layers -- sort of an organic drop-out effect, although less random. Some of "learned" kernels in the late layers prove to be incorrect, and get their weights re-scrambled. As a result of this drop-out, the model gets briefly less accurate, but soon learns better than before, as seen in the graph.
I know of no way to predict when and how this will happen with a given topology. My personal hope is that it turns out to be some sort of harmonic resonance inherent in the topology, something like audio resonance in a closed space, or the spots/stripes on many animals.
The TensorFlow tutorial for using CNN for the cifar10 data set has the following advice:
EXERCISE: When experimenting, it is sometimes annoying that the first training step can take so long. Try decreasing the number of images that initially fill up the queue. Search for NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN in cifar10.py.
In order to play around with it, I tried decreasing this number by a lot but it doesn't seem to change the training time. Is there anything I can do? I tried even changing it to something as low as 5 and the training session still continued very slowly.
Any help would be appreciated!
Note that this exercise only speeds up the first step time by skipping the prefetching of a larger from of the data. This exercise does not speed up the overall training
That said, the tutorial text needs to be updated. It should read
Search for min_fraction_of_examples_in_queue in cifar10_input.py.
If you lower this number, the first step should be much quicker because the model will not attempt to prefetch the input.