Accuracy starts decresing while loss keeps going down

Accuracy starts decresing while loss keeps going down - python

I am training the following NN with tensorflow:
def build_model():
inputs_layers = []
concat_layers= []
for k in range(k_i, k_f+1):
kmers = train_datasets[k].shape[1]
unique_kmers = train_datasets[k].shape[2]
input = Input(shape=(kmers, unique_kmers))
inputs_layers.append(input)
x = Dense(4, activity_regularizer=tf.keras.regularizers.l2(0.2))(input)
x = Dropout(0.4)(x)
x = Flatten()(x)
concat_layers.append(x)
inputs = keras.layers.concatenate(concat_layers, name='concat_layer')
x = Dense(4, activation='relu',activity_regularizer=tf.keras.regularizers.l2(0.2))(inputs)
x = Dropout(0.3)(x)
x = Flatten()(x)
outputs = Dense(1, activation='sigmoid')(x)
return inputs_layers, outputs
I used the for loop for creating the input layers because I need them to be flexible.
The problem is that when I train this NN, at the beginning the validation loss starts going down, as the accuracy goes up. But after some point, the validation accuracy starts to go down while the loss keeps going down.
I understand that this might be possible because the accuracy is mesured when the proablilites of the output are converted into 1 or 0, but I expect this to be an exception when I am not "lucky" with a particular validation set. However, I shuffled my dataset and obtained different validation sets several times, but the output is always the same: loss and accuracy go down together.
I understand that the model is overfitting. Desipite that, I would still excpect to obtain a correlation between accuracy and loss. I am using a stop_early callback monitoring val_loss. I dont like the idea to change it to monitor val_accuracy, because I feel I would be loosing fitness (because I would prevent val_loss to reach the lowest value)

To me this looks like overfitting:
One technique to stop training in such a case is early stopping (see also the Keras Early Stopping callback).
Avoiding overfitting is tricky. You are already employing one method which is using the Dropout layer. You can try increasing the probability but there is a sweetspot and "wrong" values will just hurt the model quality and performance.
The holy grale here is usually "more data". If not available, you can try data-augmentation.

This is not so unusual, especially when you're using a "regularizer".
Your loss is a sum of the "actual loss" (the one you defined in compile) and the regularization losses.
So, it's perfectly possible that your weights/activations are going down, thus your regularization loss is going down too, while the actual loss may be going up invisibly.
Also, accuracy and loss are not always well connected. Although your case seems extreme, sometimes the accuracy might stop improving while the loss keeps improving (especially in a case where the loss follows a strange logic).
So, some hints:
Try this model without regularization and see what happens
If you see the regularization was the problem and you still want it, decrease the regularization coefficients
Create a callback to stop training a few epochs after the maximum validation accuracy
Use losses that follow the accuracy relatively well (usually the stardard losses do)
Don't forget to check whether your validation data is in the "same scale" as the training data.

Related

Why does train data performance deteriorate dramatically?

I am training a binary classifier model that classifies between disease and non-disease.
When I run the model, training loss decreased and auc, acc, get increased.
But, after certain epoch train loss increased and auc, acc were decreased.
I don't know why training performance got decreased after certain epoch.
I used general 1d cnn model and methods, details here:
I tried already to:
batch shuffle
introduce class weights
loss change (binary_crossentropy > BinaryFocalLoss)
learning_rate change

Two questions for you going forward.
Does the training and validation accuracy keep dropping - when you would just let it run for let's say 100 epochs? Definitely something I would try.
Which optimizer are you using? SGD? ADAM?
How large is your dropout, maybe this value is too large. Try without and check whether the behavior is still the same.
It might also be the optimizer
As you do not seem to augment (this could be a potential issue if you do by accident break some label affiliation) your data, each epoch should see similar gradients. Thus I guess, at this point in your optimization process, the learning rate and thus the update step is not adjusted properly - hence not allowing to further progress into that local optimum, and rather overstepping the minimum while at the same time decreasing training and validation performance.
This is an intuitive explanation and the next things I would try are:
Scheduling the learning rate
Using a more sophisticated optimizer (starting with ADAM if you are not already using it)

Your model is overfitting. This is why your accuracy increases and then begins decreasing. You need to implement Early Stopping to stop at the Epoch with the best results. You should also implement dropout layers.

Keras model predictions with strange behavior: High acc while "training=True", bad otherwise

I'm trying to overfit my data with an episode based training (https://arxiv.org/abs/1711.06025). I've reproduced the original paper model and trained it on my data, a 900 images binary classification problem.
The loss seems to have stabilized with the overfitting I was expecting. The problem is, the model just predicts right while on training mode.
batch, label = epg.get_episode(X_train)
logits = model(batch, training=True)
# model.predict(batch) gets same result as model(batch, training=False)
print("pred:", K.eval(logits)[0, 0])
print("target:", K.eval(label)[0, 0])
With parameter training=True, I get 99.6% acc. Turning it off, I get around 30% (this is less than random hahah -binary problem).
Note that I'm not changing the data, it's the same used for training.
Looking for this issue, I found some like it, pointing to BatchNormalization layer, or don't pre scaling the data before predicting. That makes whole sense, but I don't know how it would be my case, since I'm using same data generator for both training True and False.

I still don't know exactly what happened to cause this behavior, but I've managed to solve it just training the model further and further, observing the predictions with training=False becoming equal as with True.
I still think it has something to do with the BatchNormalization() layer, although it has never happened with my other models before.

In tensorflow why for a same dropout value of 0.8 when run with adam optimiser with 50epochs give different accuracy each time i run it?

I am building ANN as below:-
model=Sequential()
model.add(Flatten(input_shape=(25,)))
model.add(Dense(25,activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(16,activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(5,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(xtraindata,ytraindata,epochs=50)
test_loss,test_acc=model.evaluate(xtestdata,ytestdata)
print(test_acc)
I am adding different features into the model and checking whether the newly added feature decreases or increases the accuracy but the problem is that each time I run this code with the same values I get different accuracy, sometimes it gets as low as 0.50 and so, I have few doubts and kindly answer them:-
Is the model giving different accuracy each time because in dropout reg. there are random dropouts in nodes and each time I run diff. nodes get silenced so thereby giving different accuracies i.e sometimes low and sometimes high?
How can I trust the accuracy of the model if each time it gives different accuracies? How can I know that the feature I have added has resulted in a decrement or increment of the accuracy?
If I get high accuracy and wanted to reproduce these results how do I save the parameters that the model has used?

Great questions. Answers:
I think your theory is right; it's the dropout. That's the only layer with an element of randomness each run, so it's likely the culprit. Try removing that layer, leaving everything else fixed, and run multiple times. Check if the accuracy is the same.
Cross validation. This article explains how it works, but the gist is that it is a statistical technique that trains and checks the accuracy of multiple runs of your model, all with different slices of data. The average accuracy of all runs is used. So highs and lows will be averaged to a true(ish) accuracy. That being said, if your model has inconsistent results by just varying dropout, it's an indicator that when you move the model to production and use real data, it will perform poorly.
Keras api has a method model.save("model_name") to save models. You can use keras.models.load_models("model_name") to get it back. As I said in point 2 though; if your model is so finicky that some trainings drastically affect accuracy, then even if you train and get good accuracy, it probably won't be useful on new data. So when you say "If I get high accuracy and wanted to reproduce these results", really you shouldn't be thinking along these lines. Instead, try to get consistently high training accuracy.

Approximately periodic jumps in TensorFlow model loss

I am using tensorflow.keras to train a CNN in an image recognition problem, using the Adam minimiser to minimise a custom loss (some code is at the bottom of the question). I am experimenting with how much data I need to use in my training set, and thought I should look into whether each of my models have properly converged. However, when plotting loss vs number of epochs of training for different training set fractions, I noticed approximately periodic spikes in the loss function, as in the plot below. Here, the different lines show different training set sizes as a fraction of my total dataset.
As I decrease the size of the training set (blue -> orange -> green), the frequency of these spikes appears to decrease, though the amplitude appears to increase. Intuitively, I would associate this kind of behaviour with a minimiser jumping out of a local minimum, but I am not experienced enough with TensorFlow/CNNs to know if that is the correct way to interpret this behaviour. Equally, I can't quite understand the variation with training set size.
Can anyone help me to understand this behaviour? And should I be concerned by these features?
from quasarnet.models import QuasarNET, custom_loss
from tensorflow.keras.optimizers import Adam
...
model = QuasarNET(
X[0,:,None].shape,
nlines=len(args.lines)+len(args.lines_bal)
)
loss = []
for i in args.lines:
loss.append(custom_loss)
for i in args.lines_bal:
loss.append(custom_loss)
adam = Adam(decay=0.)
model.compile(optimizer=adam, loss=loss, metrics=[])
box, sample_weight = io.objective(z,Y,bal,lines=args.lines,
lines_bal=args.lines_bal)
print( "starting fit")
history = model.fit(X[:,:,None], box,
epochs = args.epochs,
batch_size = 256,
sample_weight = sample_weight)

Following some discussion from a colleague, I believe that we have solved this problem. As a default, the Adam minimiser uses an adaptive learning rate that is inversely proportional to the variance of the gradient in its recent history. When the loss starts to flatten out, the variance of the gradient decreases, and so the minimiser increases the learning rate. This can happen quite drastically, causing the minimiser to "jump" to a higher loss point in parameter space.
You can avoid this by setting amsgrad=True when initialising the minimiser (http://www.satyenkale.com/papers/amsgrad.pdf). This prevents the learning rate from increasing in this way, and thus results in better convergence. The (somewhat basic) plot below shows loss vs number of training epochs for the normal setup, as in the original question (norm loss) compared to the loss when setting amsgrad=True in the minimiser (amsgrad loss).
Clearly, the loss function is much better behaved with amsgrad=True, and, with more epochs of training, should result in a stable convergence.

Difference between training and testing accuracy+ Tensorflow tutorial

The code in this tensorflow tutorial uses this section of the code to calculate the validation accuracy right?
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
y=eval_labels,
num_epochs=1,
shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)
Question: So if I had to calculate the training set accuracy that is to see if my model is overfitting my training set data, if I changed the value of "x" to train_data and feed the training data for testing as well, Would it give me the training set accuracy?
If not, how do I check if my model is overfitting my dataset?
How does the number of steps affect the accuracy?
Like if I have trained it for 20000 steps and then if I train it for another 100. Why does it change the accuracy? Is it since the weights are being calculated all over again? Would it be advisable to do something like this then?
mnist_classifier.train(
input_fn=train_input_fn,
steps=20000,
hooks=[logging_hook])

Normally you have 3 datasets, 1 for training, 1 for validation and 1 for testing. All these datasets have to be unique, an image of the training set may not occur in the validation or test set, etc. You train with the training set and after each epoch, you validate the model with the validation data. The optimizers will always try to update the weights to perfectly classify the training data, the training accuracy will therefore get very high (>90). The validation data is data the model has never seen before, and its done after each epoch (or x amount of steps) to show how well the model reacts to data is hasn't seen before, this shows how well the model will improve overtime.
The more you train, the higher the training accuracy will become, since the optimizer will do its best to get that value to 100%. The validation data, that does not update the weights, also increases overtime, but not continuously. While the training accuracy keeps improving, the validation accuracy might stop improving. The moment the validation accuracy decreases over time, well then you're overfitting. This means that the model is focusing too much on the training data, and that it can't classify another character correctly if it differs from the training set.
At the end of all the training you use a test set, this will determine the actual accuracy of your model on new data.
#xmacz: I cannot add comments yet, only answers so I just update my answer. Yes, I checked the source code, your first lines of code tests the model on test data

The evaluate is just a function which does some numerical activities to the input data and produces some output. If you use it for testing data it should give the testing accuracy and if you input the training data it should output the training accuracy.
At the end of the day it is just mathematics. What the output is intuitively, is something that you would have to ascertain.

How to know whether your model is overfitting is something you do while training the model. You have to set apart another set called validation dataset which is different from the test and training sets. A typical split of datasets is 70%-20%-10% for training, testing and validating respectively.
During the training, every n steps you test your model on the validation dataset. During the first iterations the score on your validation set will get better but at some point it will get worse. You can use this information to stop your training when your model starts to overfit but doing it right is an art. You could for instance stop after 5 tests that your accuracy has been decreasing consecutively, because sometimes you can see that it gets worse but in the next test it gets better. It's hard to say, it depends on many factors.
Regarding to your second question, iterating another 100 steps could make your model better or worse, depending on whether it's overfitting or not, so I'm afraid that question doesn't have a clear answer. The weights will rarely stop changing because the iterations/steps are "moving" them, for good or for bad. Again, it's difficult to say how to get good results, but you could try early stopping using a validation set, as I've mentioned before.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.