Python Keras - Accuracy drops to zero - python

I have a problem, when training a U-Net, which has many similarities with a CNN, in Keras with Tensorflow. When starting the Training, the Accuracy increases and the loss steadily goes down. At around epoch 40, in my example, the validation loss jumps to the maximum and the validation accuracy to zero. What can I do, to prevent that from happening. I am using a similar approach to this one, for my code, in Keras.
Example image of the Loss
Edit:
I already tried changing Learning rate, adding dropout and changing optimzers, those will not change the curve for the better. As i have a big training set, it is very unlikely, that I am encountering overfitting.

Related

What causes neural network accuracy to sharply increase after only one epoch?

I'm using a relatively simple neural network with fully connected layers in keras. For some reason, the accuracy drastically increases basically to its final value after only one training epoch (likewise, the loss sharply decreases). I've tried architectures with larger and smaller numbers of hidden layers too. This network also performs poorly on the testing data, so I am trying to find a more optimal architecture or improve my training set accordingly.
It is trained on a set of 6500 1D array-like data, and I'm using a batch size of 512.
As said by Murilo, hard to say much without more information but it can come from multiple things:
Your network learns through the batches of each epoch, meaning that
your ~12 batches (6500/512) are already enough to learn a good bit of
classification.
Your weights are not really well initialized, and produce a huge
loss for the first epoch. The massive decrease in the loss is
actually the solver 'squishing' the weights. The best explanation I
found for this comes from A. Karpathy in his 'MakeMore' tutorial:
https://youtu.be/P6sfmUTpUmc?t=260
Now this sudden decrease of the loss is not extreme here (from 0.5 to 0.2) so I would not care much. I agree with Murilo that low accuracy in validation can come from too few samples in your validation set, or a bad shuffling between train and validation sets.

Why does train data performance deteriorate dramatically?

I am training a binary classifier model that classifies between disease and non-disease.
When I run the model, training loss decreased and auc, acc, get increased.
But, after certain epoch train loss increased and auc, acc were decreased.
I don't know why training performance got decreased after certain epoch.
I used general 1d cnn model and methods, details here:
I tried already to:
batch shuffle
introduce class weights
loss change (binary_crossentropy > BinaryFocalLoss)
learning_rate change
Two questions for you going forward.
Does the training and validation accuracy keep dropping - when you would just let it run for let's say 100 epochs? Definitely something I would try.
Which optimizer are you using? SGD? ADAM?
How large is your dropout, maybe this value is too large. Try without and check whether the behavior is still the same.
It might also be the optimizer
As you do not seem to augment (this could be a potential issue if you do by accident break some label affiliation) your data, each epoch should see similar gradients. Thus I guess, at this point in your optimization process, the learning rate and thus the update step is not adjusted properly - hence not allowing to further progress into that local optimum, and rather overstepping the minimum while at the same time decreasing training and validation performance.
This is an intuitive explanation and the next things I would try are:
Scheduling the learning rate
Using a more sophisticated optimizer (starting with ADAM if you are not already using it)
Your model is overfitting. This is why your accuracy increases and then begins decreasing. You need to implement Early Stopping to stop at the Epoch with the best results. You should also implement dropout layers.

Why is tf.keras BatchNormalization causing GANs to produce nonsense loss and accuracy?

Background:
I've been getting unusual losses and accuracies when training GANs with batch normalization layers in the discriminator using tf.keras. GANs have an optimal objective function value of log(4), which occurs when the discriminator is completely unable to discern real samples from fakes and hence predicts 0.5 for all samples. When I include BatchNormalization layers in my discriminator, both the generator and the discriminator achieve near perfect scores (high accuracy, low loss), which is impossible in an adversarial setting.
Without BatchNorm:
This figure shows the losses (y) per epoch (x) when BN is not used. Note that occasional values below the theoretical minimum are due to the training being an iterative process.
This figure shows the accuracies when BN is not used, which settle at about 50% each. Both of these figures show reasonable values.
With BatchNorm:
This figure shows the losses (y) per epoch (x) when BN is used. See how the GAN objective, which shouldn't fall below log(4), approaches 0. This figure shows the accuracies when BN is used, with both approaching 100%. GANs are adversarial; the generator and discriminator can't both have 100% accuracy.
Question:
The code for building and training the GAN can be found here. Am I missing something, and have I made a mistake in my implementation, or is there a bug in tf.keras? I'm pretty sure that this is a technical issue and not a theoretical problem that "GAN-hacks" can solve. Note that this only involves using BatchNormalization layers in the discriminator; using them in the generator does not cause this issue.
There is an issue with Tensorflow's BatchNormalization layer in TF 2.0 and 2.1; downgrading to TF 1.15 resolves the problem. The cause of the problem has not yet been determined.
Here is the relevant GitHub issue: https://github.com/tensorflow/tensorflow/issues/37673
The cause of the problem is straightforward. The Discriminator learns to distinguish between train and test phase of BatchNormalization layer, instead of training to distinguish the data.
In the train phase, actual batch mean and variance is used in BN, as opposed to the test phase, where moving mean and moving variance stored in the BN are used.

Is it normal to have a high accuracy in the training from the beginning in keras?

I am training a model that does image captioning. I noticed that my model get a very high training accuracy in the first epoch ( around 89%)as well as the validation accuracy. Actually the training accuracy starts in a very high point from the beginning of the first epoch, it starts around %60 and goes up to %80 very fast. That does not make sense to me because the model learns very fast with a very high accuracy in the beginning.
Here is a screenshot of the output
If you are using mini-batches during fitting, you can watch the accuracy and loss change during each iteration. Your first few mini-batches will probably be terrible and then jump up around half-way through if the optimizer has found a reasonable local minimum. I've had this happen a lot; it depends a lot on which optimizer I use, the size of the model, and the amount of data. On its own, it isn't necessarily a bad thing. But be sure to check for over-fitting with a test set.

Validation accuracy increasing but validation loss is also increasing

I am using a CNN network to classify images into 5 classes. The size of my dataset is around 370K. I am using Adam optimizer with learning rate 0.0001 and batch size of 32. Surprisingly, I am getting improvement in validation accuracy over the epochs but validation loss is constantly growing.
I am assuming that the model is becoming less and less unsure about the validation set but the accuracy is more because the value of softmax output is more than the threshold value.
What can be the reason behind this? Any help in this regard would be highly appreciated.
I think this is a case of overfitting, as previous comments pointed out. Overfitting can be the result of high variance in the dataset. When you trained the CNN it showed a good ratio towards the decreasing of training error, producing a more complex model. More complex models produce overfitting and it can be noted when validation error tends to increase.
Adam optimizer is taking care of the learning rate, exponential decay and in general of the optimization of the model, but it won't take any action against overfitting. If you want to reduce it (overfitting), you will need to add a regularization technique which will penalize large values of the weights in the model.
You can read more details about this in the deep learning book: http://www.deeplearningbook.org/contents/regularization.html

Categories

Resources