Validation loss is zero on first epoch only - python

Problem
I am trying to build a regression model in tensorflow using the dataset and keras API's. The target contains quite a lot of zero's and the non-zero values are roughly distributed normally though all are positive.
When I try to create the linear model below I notice that on the first epoch the validation loss and both of the validation metrics are exactly 0, though the training loss and metrics are not. This problem disappears after the first epoch. Though this doesn't keep me from creating the model I still can not explain why this happens and what I might do about it.
Tried so far
What I have unsuccesfully tried to pinpoint where the problem lies:
Swapping train and test
A smaller batchsize (32 instead of 255)
Shuffling test
Setting the initalizers of the dense layer to normal
My code
train = tf.data.experimental.make_csv_dataset(...)
test = tf.data.experimental.make_csv_dataset(...) # a smaller csv file
model = Sequential([
DenseFeatures(features),
Dense(1, activation='linear') # we are doing a regression
])
model.compile(
optimizer='adam',
loss='mean_squared_error',
metrics=['mean_squared_error', 'mean_absolute_error']
)
model.fit(train,
epochs=2,
validation_data=test
)

This is completely normal because normal metrics and validation metrics are computed in different times.
Nomral metrics are computed during the epoch, so you can see what is the loss score and other values that come from the training.
Validation metrics are used to see wether the parameters computed during a certain epoch are good. This means that some of the training data used in a specific epoch (you can specify the percentage) are not used for training, but they're used intead to compute the network accuracy. So theese values are computed at the end of the epoch, when you have a certain parameter configuration.

Related

Find predicted classification on validation data set

I am working on a neural network using TensorFlow that takes in feature vectors of length 1476 and is attempting to classify each feature vector into one of 6 categories (labels). The NN itself does not have a problem, but have been unsuccessful in finding how to get the predicted label's on the validation data set. For reference, here is my neural network:
#define the model
model = Sequential()
model.add(Dense(units=100,activation='tanh', input_dim=1476))
model.add(Dropout(0.5))
model.add(Dense(units=100,activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(units=6,activation='tanh'))
And then here is the model being fitted with the validation data set included (Note that I recognize fit_transform should be converted to fit method):
model.fit_generator(generator(2000),
epochs=3000,
steps_per_epoch=1,
validation_data=validationGenerator(),
validation_steps=40,
verbose=2,
callbacks=[tensorboard,checkpoint,csv_logger])
The generator and validationGenerator methods are just getting the training/validation data and label's to be fed in. When the model is ran, I am able to see the standard output per epoch, containing the training accuracy and loss, along with the validation accuracy and loss.
My question is if there is a way to see more info into the validation accuracy. I want to be able to see the predicted outputs vs the ground truth's for each feature vector in the validation data set per epoch. Is this at all possible? I want to be able to see if my model is classifying heavily towards only a few specific labels.

Training loss stays constant while validation loss fluctuates heavily

While doing transfer learning on VGG, with decent amount of data, and with the following configuration:
base_big_3 = tf.keras.applications.VGG19(include_top=False, weights='imagenet',input_shape=[IMG_SIZE,IMG_SIZE,3])
model_big_3 = tf.keras.Sequential()
model_big_3.add(base_big_3)
model_big_3.add(BatchNormalization(axis=-1))
model_big_3.add(GlobalAveragePooling2D())
model_big_3.add(Dense(5, activation='softmax'))
model_big_3.compile(loss=tf.keras.losses.CategoricalCrossentropy(), optimizer=tf.keras.optimizers.Adamax(learning_rate=0.01), metrics=['acc'])
history = model_big_3.fit(
train_generator,
steps_per_epoch=BATCH_SIZE,
epochs=100,
validation_data=valid_generator,
batch_size=BATCH_SIZE
)
The training loss and validation loss varies as below, wherein the training loss is constant throughout and validation loss spikes initially to become constant afterwards:
What I tried out
I tried the solutions given here one by one and decreased the learning rate from 0.01 to 0.0001.Now, this time training loss did go down slightly but then validation error still seems super fluctuating. The training loss and validation loss varies as below:
The above solution link also suggests to normalize the input, but in my opinion images doesn't need to be normalized because the data doesn't vary much and also that the VGG network already has batch normalization, please correct me if I'm wrong.Please point what is leading to this kind of behavior, what to change in the configs and how can I improve training?
One thing I see is you set steps_per_epoch = BATCH_SIZE. Assume you have 3200 training samples and the BATCH_SIZE=32. To go through all your training samples you would have to go through 3200/32=100 batches. But with steps_per_epoch=BATCH_SIZE=32 you only go through 1024 samples in an epoch. Set the steps_per_epoch as
steps_per_epoch =number_of_train samples//BATCH_SIZE
where BATCH_SIZE is whatever you specified in the generator. Alternatively you can leave it as None and model.fit will determine the right value internally.
As stated in the model.fit documentation located here. ,
Do not specify the batch_size if your data is in the form of datasets,
generators, or keras.utils.Sequence instances (since they generate batches).
Since in model.fit you use train_generator I assume this is a generator.
The VGG model was trained on imagenet images where the pixel values were rescaled within the range from -1 to +1. So somewhere in your input pipeline you should rescale the images. For example image=image/127.5-1 will do the job. What BATCH_SIZE did you use? Making it larger (within the limits of your memory size) may help smooth out the fluctuations.
I also recommend you use two keras callbacks, EarlyStopping and ReduceLROnPlateau. Documentation is here. Set them up to monitor validation loss. My suggested code is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=4,verbose=1,
restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5,
patience=2, verbose=1)
callbacks=[estop, rlronp]
# in model.fit add callbacks=callbacks

What is training accuracy and training loss and why we need to compute them?

I am new to Lstm and machine learning and I am trying to understand some of it's concepts. Below is the code of my Lstm model.
Lstm model:
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dropout(0.1))
model.add(Dense(vocab_size, activation='softmax'))
early_stopping = EarlyStopping(monitor='val_loss', patience=42)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X, y, validation_split=0.2, epochs=500, verbose=2,batch_size = 20)
Below is a sample of my output:
And the train/test accuracy and train/test loss diagrams:
My undersanding (and please correct me if I am wrong) is that val_loss and val_accuracy is the loss and accuracy of the test data. My question is, what is the train accuracy and train loss and how these values are computed?. Thank you.
1. loss and val_loss-
In deep learning, the loss is the value that a neural network is trying to minimize. That is how a neural network learns by adjusting weights and biases in a manner that reduces the loss.
loss and val_loss differ because the former is applied to the train set, and the latter to the test set. As such, the latter is a good indication of how the model performs on unseen data.
2. accuracy and val_accuracy-
Once again, acc is on the training data, and val_acc is on the validation data. It's best to rely on val_acc for a fair representation of model performance because a good neural network will end up fitting the training data at 100%, but would perform poorly on unseen data.
Training should be stopped when val_acc stops increasing, otherwise your model will probably overffit. You can use earlystopping callback to stop training.
3. Why do we need train accuracy and loss?
It's not a meaningful evaluation metric because a neural network with sufficient parameters can essentially memorize the labels of training data and then perform no better than random guessing on previously unseen examples.
However, it can be useful to monitor the accuracy and loss at some fixed interval during training as it may indicate whether the backend is functioning as expected and if the training process needs to be stopped.
Refer here for a detailed explanation about earlystopping.
4. How accuracy and loss are calculated?
Loss and accuracy are calculated as you train, according to the loss and metrics specified in compiling the model. Before you train, you must compile your model to configure the learning process. This allows you to specify the optimizer, loss function, and metrics, which in turn are how the model fit function knows what loss function to use, what metrics to keep track of, etc.
The loss function (like binary cross entropy) documentation can be found here and the metrics (like accuracy) documentation can be found here.

Keras loss and metrics values do not match with same function in each

I am using keras with a custom loss function like below:
def custom_fn(y_true, y_pred):
# changing y_true, y_pred values systematically
return mean_absolute_percentage_error(y_true, y_pred)
Then I am calling model.compile(loss=custom_fn) and model.fit(X, y,..validation_data=(X_val, y_val)..)
Keras is then saving loss and val_loss in model history. As a sanity check, when the model finishes training, I am using model.predict(X_val) so I can calculate validation loss manually with my custom_fn using the trained model.
I am saving the model with the best epoch using this callback:
callbacks.append(ModelCheckpoint(path, save_best_only=True, monitor='val_loss', mode='min'))
so after calculating this, the validation loss should match keras' val_loss value of the best epoch. But this is not happening.
As another attempt to figure this issue out, I am also doing this:
model.compile(loss=custom_fn, metrics=[custom_fn])
And to my surprise, val_loss and val_custom_fn do not match (neither loss or loss_custom_fn for that matter).
This is really strange, my custom_fn is essentially keras' built in mape with the y_true and y_pred slightly manipulated. what is going on here?
PS: the layers I am using are LSTM layers and a final Dense layer. But I think this information is not relevant to the problem. I am also using regularisation as hyperparameter but not dropout.
Update
Even removing custom_fn and using keras' built in mape as a loss function and metric like so:
model.compile(loss='mape', metrics=['mape'])
and for simplicity, removing ModelCheckpoint callback is having the same effect; val_loss and val_mape for each epoch are not equivalent. This is extremely strange to me. I am either missing something or there is a bug in Keras code..the former might be more realistic.
This blog post suggests that keras adds any regularisation used in the training when calculating the validation loss. And obviously, when calculating the metric of choice no regularisation is applied. This is why it occurs with any loss function of choice as stated in the question.
This is something I could not find any documentation on from Keras. However, it seems to hold up since when I remove all regularisation hyperparameters, the val_loss and val_custom_fn match exactly in each epoch.
An easy workaround is to either use the custom_fn as a metric and save the best model based on the metric (val_custom_fn) than on the val_loss. Or else Loop through each epoch manually and calculate the correct val_loss manually after training each epoch. The latter seems to make more sense since there is no reason to include custom_fn both as a metric and as a loss function.
If anyone can find any evidence of this in the Keras documentation that would be helpful.

Different accuracy by fit() and evaluate() in Keras with the same dataset

I program Keras's code to train GoogleNet. However, accuracy gotten from fit() is 100% yet with the same training dataset used for evaluate(), accuracy remains 25% only, which has such huge discrepancy!!! Also, accuracy by evaluate(), which is not like fit(), won't get improved for training more times, which means it almost stays in 25%.
Does anyone has idea of what is wrong with this situation?
# Training Dataset and labels r given. Here load GoogleNet model
from keras.models import load_model
model = load_model('FT_InceptionV3.h5')
# Training Phase
model.fit(x=X_train,
y=y_train,
batch_size=5,
epochs=20,
validation_split=0,
#callbacks=[tensorboard]
)
#Testing Phase
train_loss , train_acc=model.evaluate(X_train, y_train, verbose=1)
print("Train loss=",train_loss,"Train accuracy",train_acc)
Training Result
Testing Result
After some digging into Keras issues, I found this.
The reason for this is that when you use fit, At each batch of the training data the weights are updated. The loss value returned by the fit method is not the mean of the loss of the final model, but the mean of the loss of all slightly different models used on each batch.
On the other hand, when you use to evaluate, the same model is used on the whole dataset. And this model actually doesn't even appear in the loss of the fit method since even at the last batch of training, the loss computed is used to update the model's weights.
To sum everything up, fit and evaluate have two completely different behaviours.
Reference:-
Keras_issues_thread
Keras_official_doc

Categories

Resources