Validation accuracy increasing but validation loss is also increasing - python

I am using a CNN network to classify images into 5 classes. The size of my dataset is around 370K. I am using Adam optimizer with learning rate 0.0001 and batch size of 32. Surprisingly, I am getting improvement in validation accuracy over the epochs but validation loss is constantly growing.
I am assuming that the model is becoming less and less unsure about the validation set but the accuracy is more because the value of softmax output is more than the threshold value.
What can be the reason behind this? Any help in this regard would be highly appreciated.

I think this is a case of overfitting, as previous comments pointed out. Overfitting can be the result of high variance in the dataset. When you trained the CNN it showed a good ratio towards the decreasing of training error, producing a more complex model. More complex models produce overfitting and it can be noted when validation error tends to increase.
Adam optimizer is taking care of the learning rate, exponential decay and in general of the optimization of the model, but it won't take any action against overfitting. If you want to reduce it (overfitting), you will need to add a regularization technique which will penalize large values of the weights in the model.
You can read more details about this in the deep learning book: http://www.deeplearningbook.org/contents/regularization.html

Related

What causes neural network accuracy to sharply increase after only one epoch?

I'm using a relatively simple neural network with fully connected layers in keras. For some reason, the accuracy drastically increases basically to its final value after only one training epoch (likewise, the loss sharply decreases). I've tried architectures with larger and smaller numbers of hidden layers too. This network also performs poorly on the testing data, so I am trying to find a more optimal architecture or improve my training set accordingly.
It is trained on a set of 6500 1D array-like data, and I'm using a batch size of 512.
As said by Murilo, hard to say much without more information but it can come from multiple things:
Your network learns through the batches of each epoch, meaning that
your ~12 batches (6500/512) are already enough to learn a good bit of
classification.
Your weights are not really well initialized, and produce a huge
loss for the first epoch. The massive decrease in the loss is
actually the solver 'squishing' the weights. The best explanation I
found for this comes from A. Karpathy in his 'MakeMore' tutorial:
https://youtu.be/P6sfmUTpUmc?t=260
Now this sudden decrease of the loss is not extreme here (from 0.5 to 0.2) so I would not care much. I agree with Murilo that low accuracy in validation can come from too few samples in your validation set, or a bad shuffling between train and validation sets.

What does a sudden increase in accuracy during epoch training show about my model?

I am learning Convolution Neural Network now and practicing it on kaggle digit recognizer (MNIST) dataset.
While training the data, I noticed that inspite of initial gradually growing accuracy, in between there was a huge jump i.e from 0.8984 to 0.9814.
As a beginner, I want to investigate what does this jump really show about my model. Here is the image of the epochs:
enter image description here
I have circled the jump in yellow. Thanks in advance!
As the loss gradually starts to decrease, this create an impact on fitting of the model. The cost function makes the loss go down, which directly creates an impact on the fitting of model. Better the fitting of model into training data, better the accuracy (which we can easily see as the accuracy increases with the reduction in loss). There is almost a difference of 0.08 in your consecutive loss function which is enough for the model to fit more from the current state.
Now as the model progresses, we try it on the testing dataset because the real world data is nothing like the data we trained it on.
However, a higher accuracy might not always be good as the model is considered to be over-evaluated which is also known as overfitting which means the model is performing too well that it can't handle any little changes. Therefore, a correct balance between learning rate and epochs are required in order to predict the classes correctly. It also depends on the architecture, Optimizing function which make sure the oscillations are low and numerous other things.

why the validation accuracy does not increase in a normal way over the epochs?

I'm trying to transfer learning VGG16 model with imagenet in a dataset of retinal images but i'm confused to get a graph like this, I don't know why the validation accuracy didn't increase in a normal way over the epochs, like training accuracy did, is it an index of overfitting ? if yes, how can i overcome it ?
My first suggestion would be to use a ResNet (network that contains residual connections) as a first step towards the improvement, in order to avoid the vanishing gradient problem.
VGGs have become less used are no longer relevant for benchmarking. What you should use instead is a ResNet50, which is available in tensorflow.keras.applications alongside other relevant pre-trained neural networks.
Also, the validation accuracy fluctuates very much; in addition to the previously mentioned possible improvement, you may want to recheck the construction of your training and validation sets.

SGD optimiser graph

I just wanted to ask a quick question. I understand that val_loss and train_loss is insufficient to tell if the model is overfitting. However, i wish to use it as a rough gauge by monitoring if the val_loss is increasing. As i use SGD optimiser, i seem to have 2 different trends based on the smoothing value. Which should i use? Blue is val_loss and Orange is train_loss.
From smoothing = 0.999, both seems to be decreasing but from smoothing = 0.927, val_loss seems to be increasing. Thank you for reading!
Also, when is a good time to decrease the learning rate? Is it directly before the model overfits?
Smoothing = 0.999
Smoothing = 0.927
In my experience with DL as applied to CNNs, overfitting is tied more to the difference in train/val accuracies/losses rather than just one or the other. In your graphs, it's clear that the difference in loss is increasing as time goes on, showing that your model does not generalize well to the dataset, and hence shows signs of overfitting. It would also help for you to track classification accuracy on train and val datasets if possible--this will show you the generalization error which acts as a similar metric but might show more visible effects.
Dropping the learning rate once the loss starts to even out and overfitting begins is a good idea; however you may find better gains for your generalization if you adjust the net's complexity to better fit the dataset first. For such overfitting, a modest decrease in complexity may help--use the difference in train/val losses and accuracies to confirm.

Strategies for solving overfitting - other options?

I am building a predictive model where I want to know can I predict whether a package will be delivered on time (Binary Yes / No), in the event that the package is not delivered on time, I wish to be able to predict by when it will be delivered in categories of <7days, <14days, <21days >28days after expected date.
I have built and tested a model for binary classification and have got an f Score of 0.92, which is satisfactory for my needs. However, when I train my categorical model, I start to see training accuracy and validation accuracy diverge (training accuracy is much better than validation accuracy). This is a sign of overfitting.
However, I have tried regularization and different values, plus using dropout and different values, and the validation accuracy never gets above 0.7. My total training set is of ~10k examples, ~3k validation, and whilst the catgorical spread is not equal there are sufficient examples of each category (I think). I am using a NN and have increased / decreased both layers and activations and still no joy
Any thoughts on where to go next. Thanks
Because you are using NN, introduce dropout layers. See if it can help to reduce the overfitting problem. And also checkout this How to choose the number of hidden layers and nodes in a feedforward neural network?
The more complex the network (hidden layers, number of neurons in them), also contribute to overfitting problem
The approach we have chosen is to carry out a linear regression with the expected duration as target variable. We have excluded some outliers, and then taken the differences between the actual and predicted days. We then max'd and min'd the difference and we now have a prediction with a tolerable range. We will keep working on the other techniques to see if we can improve. Thanks to everyone who suggested ideas

Categories

Resources