Different accuracy by fit() and evaluate() in Keras with the same dataset

Different accuracy by fit() and evaluate() in Keras with the same dataset - python

I program Keras's code to train GoogleNet. However, accuracy gotten from fit() is 100% yet with the same training dataset used for evaluate(), accuracy remains 25% only, which has such huge discrepancy!!! Also, accuracy by evaluate(), which is not like fit(), won't get improved for training more times, which means it almost stays in 25%.
Does anyone has idea of what is wrong with this situation?
# Training Dataset and labels r given. Here load GoogleNet model
from keras.models import load_model
model = load_model('FT_InceptionV3.h5')
# Training Phase
model.fit(x=X_train,
y=y_train,
batch_size=5,
epochs=20,
validation_split=0,
#callbacks=[tensorboard]
)
#Testing Phase
train_loss , train_acc=model.evaluate(X_train, y_train, verbose=1)
print("Train loss=",train_loss,"Train accuracy",train_acc)
Training Result
Testing Result

After some digging into Keras issues, I found this.
The reason for this is that when you use fit, At each batch of the training data the weights are updated. The loss value returned by the fit method is not the mean of the loss of the final model, but the mean of the loss of all slightly different models used on each batch.
On the other hand, when you use to evaluate, the same model is used on the whole dataset. And this model actually doesn't even appear in the loss of the fit method since even at the last batch of training, the loss computed is used to update the model's weights.
To sum everything up, fit and evaluate have two completely different behaviours.
Reference:-
Keras_issues_thread
Keras_official_doc

Related

Accuracy increase but loss also increases

I am using this model.While using this model validation accuracy is increasing but at a same time validation loss is also increasing.What happening here?
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import adam
model_alpha1 = Sequential()
model_alpha1.add(Dense(64, input_dim=96, activation='relu'))
model_alpha1.add(Dense(2, activation='softmax'))
opt_alpha1 = adam(lr=0.001)
model_alpha1.compile(loss='sparse_categorical_crossentropy', optimizer=opt_alpha1, metrics=
['accuracy'])
history = model_alpha1.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=200, verbose=1)
If need any more details i will provide just comment for the detail.Thank you

It seems that your model is overfitting. In other words, your model is fitting too much for the training data and that is why it is not performing as well for the validation data anymore.
Typical way to prevent overfitting is to use regularization techniques. For example:
Dropout layers https://keras.io/api/layers/regularization_layers/dropout/
Early stopping https://keras.io/api/callbacks/early_stopping/
Noise https://keras.io/api/layers/regularization_layers/gaussian_noise/
Try to train less deep NN for your problem or try dropout layers (or both obviously depending how these would affect). From your figure, we can see that the overfitting starts after ~25 epochs.
Overfitting may be caused, for example, by using too complex model for data set, which is not large enough. Or you just train the model too long! (here early stopping will fix the issue)
Here some regularization examples with TF: https://tensorflow.rstudio.com/tutorials/beginners/basic-ml/tutorial_overfit_underfit/

When training a classification model, it is not possible to optimize for accuracy directly since it is not a differentiable function. Therefore, we use cross-entropy as our loss function, which is highly correlated with accuracy. When inspecting our metrics it is important to remember that these are still two different metrics.
In terms of CE loss, your model is exhibiting textbook overfitting. However, in terms of accuracy, which is what you are actually interested in, it simply "finished training". This is why we track not only the loss but also the actual metrics we are interested in in the bottom line - so that we make our decisions based on them.

Training loss stays constant while validation loss fluctuates heavily

While doing transfer learning on VGG, with decent amount of data, and with the following configuration:
base_big_3 = tf.keras.applications.VGG19(include_top=False, weights='imagenet',input_shape=[IMG_SIZE,IMG_SIZE,3])
model_big_3 = tf.keras.Sequential()
model_big_3.add(base_big_3)
model_big_3.add(BatchNormalization(axis=-1))
model_big_3.add(GlobalAveragePooling2D())
model_big_3.add(Dense(5, activation='softmax'))
model_big_3.compile(loss=tf.keras.losses.CategoricalCrossentropy(), optimizer=tf.keras.optimizers.Adamax(learning_rate=0.01), metrics=['acc'])
history = model_big_3.fit(
train_generator,
steps_per_epoch=BATCH_SIZE,
epochs=100,
validation_data=valid_generator,
batch_size=BATCH_SIZE
)
The training loss and validation loss varies as below, wherein the training loss is constant throughout and validation loss spikes initially to become constant afterwards:
What I tried out
I tried the solutions given here one by one and decreased the learning rate from 0.01 to 0.0001.Now, this time training loss did go down slightly but then validation error still seems super fluctuating. The training loss and validation loss varies as below:
The above solution link also suggests to normalize the input, but in my opinion images doesn't need to be normalized because the data doesn't vary much and also that the VGG network already has batch normalization, please correct me if I'm wrong.Please point what is leading to this kind of behavior, what to change in the configs and how can I improve training?

One thing I see is you set steps_per_epoch = BATCH_SIZE. Assume you have 3200 training samples and the BATCH_SIZE=32. To go through all your training samples you would have to go through 3200/32=100 batches. But with steps_per_epoch=BATCH_SIZE=32 you only go through 1024 samples in an epoch. Set the steps_per_epoch as
steps_per_epoch =number_of_train samples//BATCH_SIZE
where BATCH_SIZE is whatever you specified in the generator. Alternatively you can leave it as None and model.fit will determine the right value internally.
As stated in the model.fit documentation located here. ,
Do not specify the batch_size if your data is in the form of datasets,
generators, or keras.utils.Sequence instances (since they generate batches).
Since in model.fit you use train_generator I assume this is a generator.
The VGG model was trained on imagenet images where the pixel values were rescaled within the range from -1 to +1. So somewhere in your input pipeline you should rescale the images. For example image=image/127.5-1 will do the job. What BATCH_SIZE did you use? Making it larger (within the limits of your memory size) may help smooth out the fluctuations.
I also recommend you use two keras callbacks, EarlyStopping and ReduceLROnPlateau. Documentation is here. Set them up to monitor validation loss. My suggested code is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=4,verbose=1,
restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5,
patience=2, verbose=1)
callbacks=[estop, rlronp]
# in model.fit add callbacks=callbacks

What is training accuracy and training loss and why we need to compute them?

I am new to Lstm and machine learning and I am trying to understand some of it's concepts. Below is the code of my Lstm model.
Lstm model:
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dropout(0.1))
model.add(Dense(vocab_size, activation='softmax'))
early_stopping = EarlyStopping(monitor='val_loss', patience=42)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X, y, validation_split=0.2, epochs=500, verbose=2,batch_size = 20)
Below is a sample of my output:
And the train/test accuracy and train/test loss diagrams:
My undersanding (and please correct me if I am wrong) is that val_loss and val_accuracy is the loss and accuracy of the test data. My question is, what is the train accuracy and train loss and how these values are computed?. Thank you.

1. loss and val_loss-
In deep learning, the loss is the value that a neural network is trying to minimize. That is how a neural network learns by adjusting weights and biases in a manner that reduces the loss.
loss and val_loss differ because the former is applied to the train set, and the latter to the test set. As such, the latter is a good indication of how the model performs on unseen data.
2. accuracy and val_accuracy-
Once again, acc is on the training data, and val_acc is on the validation data. It's best to rely on val_acc for a fair representation of model performance because a good neural network will end up fitting the training data at 100%, but would perform poorly on unseen data.
Training should be stopped when val_acc stops increasing, otherwise your model will probably overffit. You can use earlystopping callback to stop training.
3. Why do we need train accuracy and loss?
It's not a meaningful evaluation metric because a neural network with sufficient parameters can essentially memorize the labels of training data and then perform no better than random guessing on previously unseen examples.
However, it can be useful to monitor the accuracy and loss at some fixed interval during training as it may indicate whether the backend is functioning as expected and if the training process needs to be stopped.
Refer here for a detailed explanation about earlystopping.
4. How accuracy and loss are calculated?
Loss and accuracy are calculated as you train, according to the loss and metrics specified in compiling the model. Before you train, you must compile your model to configure the learning process. This allows you to specify the optimizer, loss function, and metrics, which in turn are how the model fit function knows what loss function to use, what metrics to keep track of, etc.
The loss function (like binary cross entropy) documentation can be found here and the metrics (like accuracy) documentation can be found here.

Validation loss is zero on first epoch only

Problem
I am trying to build a regression model in tensorflow using the dataset and keras API's. The target contains quite a lot of zero's and the non-zero values are roughly distributed normally though all are positive.
When I try to create the linear model below I notice that on the first epoch the validation loss and both of the validation metrics are exactly 0, though the training loss and metrics are not. This problem disappears after the first epoch. Though this doesn't keep me from creating the model I still can not explain why this happens and what I might do about it.
Tried so far
What I have unsuccesfully tried to pinpoint where the problem lies:
Swapping train and test
A smaller batchsize (32 instead of 255)
Shuffling test
Setting the initalizers of the dense layer to normal
My code
train = tf.data.experimental.make_csv_dataset(...)
test = tf.data.experimental.make_csv_dataset(...) # a smaller csv file
model = Sequential([
DenseFeatures(features),
Dense(1, activation='linear') # we are doing a regression
])
model.compile(
optimizer='adam',
loss='mean_squared_error',
metrics=['mean_squared_error', 'mean_absolute_error']
)
model.fit(train,
epochs=2,
validation_data=test
)

This is completely normal because normal metrics and validation metrics are computed in different times.
Nomral metrics are computed during the epoch, so you can see what is the loss score and other values that come from the training.
Validation metrics are used to see wether the parameters computed during a certain epoch are good. This means that some of the training data used in a specific epoch (you can specify the percentage) are not used for training, but they're used intead to compute the network accuracy. So theese values are computed at the end of the epoch, when you have a certain parameter configuration.

How can I compute categorical accuracy after I predict the results in Keras model?

I have built a Keras model and while training, the categorical accuracy metric reaches 0.78.
However after training the model, when I predict the output of same training data when I run the following code:
predicted_labels = model.predict(input_data)
acc = sklearn.metrics.accuracy_score(true_labels, predicted_labels)
the accuracy is 0.39.
To summarize, I don't get same accuracy result for Keras and Sklearn.

There are many ways of measuring accuracy, and sklearn might not be using the same as Keras.
You may take your compiled model and lossAndMetrics = model.evaluate(input_data, true_labels) to see loss and metrics that are surely the same you used for training.
PS: it's not rare to have a bad result for test/validation data if your model is overfitting.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Different accuracy by fit() and evaluate() in Keras with the same dataset - python

Related

Accuracy increase but loss also increases

Training loss stays constant while validation loss fluctuates heavily

What is training accuracy and training loss and why we need to compute them?

Validation loss is zero on first epoch only

How can I compute categorical accuracy after I predict the results in Keras model?

Categories

Resources