Keras progress validation accuracy not showing correctly - python

currently i'm training my model using the following model.fit
history = finetune_model.fit_generator(train_generator, epochs=NUM_EPOCHS, workers=1,
steps_per_epoch=num_train_images // batch_size,
validation_data=(x_val, y_val_))
And also i'm using the docker image from dockerhub tensorflow/tensorflow:1.15.0-gpu-py3-jupyter
Here is the current showing:
Epoch 38/40
61/62 [============================>.] - ETA: 0s - loss: 0.4109 - acc: 0.9536Epoch 1/40
420/62 [===========================================================================================================================================================================================================] - 2s 4ms/sample - loss: 0.6136 - acc: 0.7190
However in Colaboratory, the output is this:
Epoch 38/40
62/62 [==============================] - 13s 212ms/step - loss: 0.4069 - acc: 0.8997 - val_loss: 0.7886 - val_acc: 0.752

Related

Tensorflow model show output every x epochs

Is it possible to show output of a model after each x epochs?
epochs = 500
model.fit(
X_train, y_train,
batch_size=16, epochs=epochs,
validation_data=(X_test, y_test)
)
I get output like:
Epoch 1/1000
3285/3285 [==============================] - 2s 592us/step - loss: 0.7643 - val_loss: 0.8058
Epoch 2/1000
3285/3285 [==============================] - 2s 526us/step - loss: 0.7637 - val_loss: 0.8044
...
What I would like, is it have output as (every 10 epochs):
Epoch 1/1000
3285/3285 [==============================] - 2s 618us/step - loss: 0.7458 - val_loss: 0.8107
Epoch 10/1000
3285/3285 [==============================] - 2s 516us/step - loss: 0.7411 - val_loss: 0.8047
Epoch 20/1000
3285/3285 [==============================] - 2s 588us/step - loss: 0.7430 - val_loss: 0.8020
I think I'm supposed to use on_batch_begin in the callback, but not sure what goes inside it.
Thanks

Keras training with validation data displays multiple progress bars and losses per epoch

I'm using Keras '2.2.4' and TensorFlow '1.13.1' and training a MobileNetV2 on a dataset of images with 2 classes.
When I fit the model on both train-set and validation-set I get two distinct progress bars and multiple losses.
history = model.fit(
data.flow(train_x,train_y,batch_size=BS),
steps_per_epoch=len(train_x)//BS,
validation_data=(val_x, val_y),
validation_steps=len(val_x)//BS,
epochs=EPOCHS)```
Epoch 1/10
153/153 [==============================] - 0s 1ms/sample - loss: 0.0134 - acc: 1.0000
12/12 [==============================] - 2s 202ms/step - loss: 0.1543 - acc: 0.9859 - val_loss: 0.0130 - val_acc: 1.0000
Epoch 2/10
153/153 [==============================] - 0s 1ms/sample - loss: 0.0026 - acc: 1.0000
12/12 [==============================] - 2s 194ms/step - loss: 0.0590 - acc: 0.9803 - val_loss: 0.0026 - val_acc: 1.0000
While if I don't add validation_data=... to model.fit:
history = model.fit(
data.flow(train_x,train_y,batch_size=BS),
steps_per_epoch=len(train_x)//BS,
epochs=EPOCHS)
Epoch 1/10
12/12 [==============================] - 2s 186ms/step - loss: 0.0404 - acc: 0.9887
Epoch 2/10
12/12 [==============================] - 2s 187ms/step - loss: 0.0189 - acc: 0.9944
Epoch 3/10
12/12 [==============================] - 2s 189ms/step - loss: 0.0137 - acc: 0.9972
I get only one progress bar that obviously won't show validation loss and metrics.
My question is: why? I remember that usually only one progress bar is shown with both train and validation loss and metrics. What is the that second progress bar shown in the first example?

Keras: val_loss is increasing and evaluate loss is too high

I'm new to Keras and I'm using it to build a normal Neural Network to classify number MNIST dataset.
Beforehand I have already split the data into 3 parts: 55000 to train, 5000 to evaluate and 10000 to test, and I have scaled the pixel density down (by dividing it by 255.0)
My model looks like this:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28]))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
And here is the compile:
model.compile(loss='sparse_categorical_crossentropy',
optimizer = 'Adam',
metrics=['accuracy'])
I train the model:
his = model.fit(xTrain, yTrain, epochs = 20, validation_data=(xValid, yValid))
At first the val_loss decreases, then it increases although the accuracy is increasing.
Train on 55000 samples, validate on 5000 samples
Epoch 1/20
55000/55000 [==============================] - 5s 91us/sample - loss: 0.2822 - accuracy: 0.9199 - val_loss: 0.1471 - val_accuracy: 0.9588
Epoch 2/20
55000/55000 [==============================] - 5s 82us/sample - loss: 0.1274 - accuracy: 0.9626 - val_loss: 0.1011 - val_accuracy: 0.9710
Epoch 3/20
55000/55000 [==============================] - 5s 83us/sample - loss: 0.0899 - accuracy: 0.9734 - val_loss: 0.0939 - val_accuracy: 0.9742
Epoch 4/20
55000/55000 [==============================] - 5s 84us/sample - loss: 0.0674 - accuracy: 0.9796 - val_loss: 0.0760 - val_accuracy: 0.9770
Epoch 5/20
55000/55000 [==============================] - 5s 94us/sample - loss: 0.0541 - accuracy: 0.9836 - val_loss: 0.0842 - val_accuracy: 0.9742
Epoch 15/20
55000/55000 [==============================] - 4s 82us/sample - loss: 0.0103 - accuracy: 0.9967 - val_loss: 0.0963 - val_accuracy: 0.9788
Epoch 16/20
55000/55000 [==============================] - 5s 84us/sample - loss: 0.0092 - accuracy: 0.9973 - val_loss: 0.0956 - val_accuracy: 0.9774
Epoch 17/20
55000/55000 [==============================] - 5s 82us/sample - loss: 0.0081 - accuracy: 0.9977 - val_loss: 0.0977 - val_accuracy: 0.9770
Epoch 18/20
55000/55000 [==============================] - 5s 85us/sample - loss: 0.0076 - accuracy: 0.9977 - val_loss: 0.1057 - val_accuracy: 0.9760
Epoch 19/20
55000/55000 [==============================] - 5s 83us/sample - loss: 0.0063 - accuracy: 0.9980 - val_loss: 0.1108 - val_accuracy: 0.9774
Epoch 20/20
55000/55000 [==============================] - 5s 85us/sample - loss: 0.0066 - accuracy: 0.9980 - val_loss: 0.1056 - val_accuracy: 0.9768
And when I evaluate the loss is too high:
model.evaluate(xTest, yTest)
Result:
10000/10000 [==============================] - 0s 41us/sample - loss: 25.7150 - accuracy: 0.9740
[25.714989705941953, 0.974]
Is this ok, or is it a sign of overfitting? Should I do something to improve it? Thanks in advance.
Usually, it is not Ok. You want the loss rate to be as small as possible. Your result is typical for overfitting. Your Network 'knows' its training data, but isn't capable of analysing new Images. You may want to add some layers. Maybe Convolutional Layers, Dropout Layer... another idea would be to augment your training images. The ImageDataGenerator-Class provided by Keras might help you out here
Another thing to look at could be your hyperparameters. Why do you use 100 nodes in the first dense layer? maybe something like 784 (28*28) seems more interesting if you want to start with a dense layer. I would suggest some combination of Convolutional-Dropout-Dense. Then your dense -layer maybe doesn't need that many nodes...

Keras training progress bar on one line with epoch number

When I use Keras to train a model with model.fit(), I see a progress bar that looks like this:
Epoch 1/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10
8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
To improve readability, I would like to have the epoch number on the same line as the progress bar, like this:
Epoch 1/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10: 8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
How can I make that change? I know that Keras has callbacks that can be invoked during training, but I am not familiar with how that works.
If you want to use an alternative, you could use tqdm (version >= 4.41.0):
from tqdm.keras import TqdmCallback
...
model.fit(..., verbose=0, callbacks=[TqdmCallback(verbose=2)])
This turns off keras' progress (verbose=0), and uses tqdm instead. For the callback, verbose=2 means separate progressbars for epochs and batches. 1 means clear batch bars when done. 0 means only show epochs (never show batch bars).
Yes, you can use callbacks (https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback). For example:
import tensorflow as tf
class PrintLogs(tf.keras.callbacks.Callback):
def __init__(self, epochs):
self.epochs = epochs
def set_params(self, params):
params['epochs'] = 0
def on_epoch_begin(self, epoch, logs=None):
print('Epoch %d/%d' % (epoch + 1, self.epochs), end='')
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
epochs = 5
model.fit(x_train, y_train,
epochs=epochs,
validation_split=0.2,
verbose = 2,
callbacks=[PrintLogs(epochs)])
output:
Train on 48000 samples, validate on 12000 samples
Epoch 1/5 - 10s - loss: 0.0306 - acc: 0.9901 - val_loss: 0.0837 - val_acc: 0.9786
Epoch 2/5 - 9s - loss: 0.0269 - acc: 0.9910 - val_loss: 0.0839 - val_acc: 0.9788
Epoch 3/5 - 9s - loss: 0.0253 - acc: 0.9915 - val_loss: 0.0895 - val_acc: 0.9781
Epoch 4/5 - 9s - loss: 0.0201 - acc: 0.9930 - val_loss: 0.0871 - val_acc: 0.9792
Epoch 5/5 - 9s - loss: 0.0206 - acc: 0.9931 - val_loss: 0.0917 - val_acc: 0.9793

Keras anomaly in its training time

I am using Keras in multi-gpu, with Tensorflow backend on 2 gpus. I am using a generator (keras.utils.Sequence) to load my data in batch mode (BS = 64). Therefore I am using the fit_generator class, providing it with my train and validation data and steps.
I noticed a strange behaviour starting from the 2nd epoch on. Basically, the first 3 steps of each epoch are completed in just 8/9 seconds each, then the network starts taking longer and longer (as it should do). Logs are the following:
Epoch 00001: val_acc improved from -inf to 0.46875, saving model to data/subs_best_model.h5
Epoch 2/32
1/29 [>.............................] - ETA: 8s - loss: 1.0664 - acc: 0.5000
2/29 [=>............................] - ETA: 8s - loss: 1.1384 - acc: 0.4531
3/29 [==>...........................] - ETA: 9s - loss: 1.0915 - acc: 0.5052
4/29 [===>..........................] - ETA: 42:03 - loss: 1.1064 - acc: 0.5117
5/29 [====>.........................] - ETA: 56:02 - loss: 1.1173 - acc: 0.4969
6/29 [=====>........................] - ETA: 1:03:13 - loss: 1.0964 - acc: 0.4974
7/29 [======>.......................] - ETA: 1:06:45 - loss: 1.0740 - acc: 0.5067
8/29 [=======>......................] - ETA: 1:08:35 - loss: 1.0592 - acc: 0.5195
9/29 [========>.....................] - ETA: 1:08:53 - loss: 1.0580 - acc: 0.5191
Do you know what could cause this anomaly/strange behaviour?
EDIT:
My DataGenerator is inspired by this implementation
The code I use for the fit_generator is as follows:
params = {'batch_size': TrainConfig.BATCH_SIZE,
'dim' : ( TrainConfig.BATCH_SIZE, 1, TrainConfig.SAMPLES),
'labels_dim': ( TrainConfig.BATCH_SIZE,),
'n_classes' : TrainConfig.OUTPUT_DIM}
training_generator = DataGenerator(train_set, **params)
validation_generator = DataGenerator(val_set, **params)
training_steps_per_epoch = int(1.*len(train_set) / batch_size)
validation_steps_per_epoch = int(1.*len(val_set) / batch_size)
history = model.fit_generator(generator=training_generator,
verbose=1,
use_multiprocessing=False,
workers=1,
steps_per_epoch=training_steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps =validation_steps_per_epoch,
callbacks=callbacks)

Categories

Resources