Accuracy of same validation dataset differs between last epoch and after fit

Accuracy of same validation dataset differs between last epoch and after fit - python

The following code gives a log ending with
Epoch 19/20
1/1 [==============================] - 0s 473ms/step - loss: 1.4018 - accuracy: 0.8750 - val_loss: 1.8656 - val_accuracy: 0.8900
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 0.5904 - accuracy: 0.8750 - val_loss: 2.1255 - val_accuracy: 0.8700
get_dataset: validation
Found 1000 files belonging to 2 classes.
Using 100 files for validation.
4/4 [==============================] - 1s 81ms/step
eval acc: 0.81
My question is:
Why is the val_accuracy after the last epoch (0.87) different from the eval acc (0.81) after the fit?
In my code, I try to use the same dataset for the validation of each epoch during fit and the additional validation afterwards.
[Update 1, 2022-07-19:
Obviously, the two accuracy calculations don't really use the same data. How can I debug which data is actually used?
[Update 3, 2022-07-20: I have followed the data into TensorFlow. The last thing I see is that in Model.evaluate (during fit) and Model.predict the x.filenames are equal. I did not manage to debug much further, because soon in quick_execute the __inference_test_function_248219 resp. the __inference_predict_function_231438 are evaluated outside Python, and the arguments are tensors with dtype=resource, whose contents I cannot see.]
I have deliberately removed my class balancing code to keep my example small. I know that this makes the accuracies less useful, but I don't care about that for now.
Note that get_dataset('validation') is only called once at the beginning of the fit, not at each epoch.
I have now also set max_queue_size=0, use_multiprocessing=False, workers=0 (as seen here, found via this related SO question about TensorFlow 1), but this did not make the accuracies equal.
]
Code:
import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
def get_dataset(subset):
print('get_dataset:', subset)
return image_dataset_from_directory(
'data-nodup-1000',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=True,
seed=1,
validation_split=0.1,
subset=subset,
crop_to_aspect_ratio=False,
)
model.fit(
get_dataset('training'),
steps_per_epoch=1,
epochs=20,
validation_data=get_dataset('validation'),
max_queue_size=0,
use_multiprocessing=False,
workers=0,
)
val_dataset = get_dataset('validation')
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))
[Update 2, 2022-07-19:
I can also reproduce the behavior with the deprecated ImageDataGenerator, using
from tensorflow.keras.applications.resnet50 import preprocess_input
from keras_preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
validation_split=0.1,
)
def get_dataset(subset):
print('get_dataset:', subset)
return datagen.flow_from_directory(
'data-nodup-1000',
class_mode='binary',
target_size=(224, 224),
shuffle=True,
seed=1,
subset=subset,
)
and
true_class = val_dataset.labels
]
[Update 4, 2022-07-21: Note that deactivating shuffling of validation data by setting shuffle=(subset == 'training') makes the two validation accuracies equal. This is not a workaround, however, because the validation set then consists only of class 1, since flow_from_directory doesn't do stratification.
]
My environment:
I am using all up-to-date libraries, like tensorflow 2.9.1 and sklearn 1.1.1 (via pip-compile -U).
The folder data-nodup-1000 contains one subfolder with 113 files of class 0, and one subfolder with 887 files of class 1.

I have now found out that in TensorFlow 2.9.1 model.predict uses the second iteration of the dataset, which is shuffled differently than the first iteration!
It even uses the second iteration when I directly call model.predict(get_dataset('validation'))!
Therefore, the entries of true_class and pred do not match.
Switching to TensorFlow 2.10.0-rc3 and its tf.keras.utils.split_dataset makes the accuracies equal.
Here's the updated code:
import tensorflow as tf
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing import image_dataset_from_directory
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_output = base_model(inputs)
base_model.trainable = False
out = Flatten(name='flat')(base_output)
out = Dense(1, activation='sigmoid')(out)
model = Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
dataset = image_dataset_from_directory(
'data-synthetic',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=True,
seed=1,
crop_to_aspect_ratio=False,
)
train_dataset, val_dataset = tf.keras.utils.split_dataset(dataset, right_size=0.1)
model.fit(
train_dataset,
steps_per_epoch=1,
epochs=20,
validation_data=val_dataset,
max_queue_size=0,
use_multiprocessing=False,
workers=0,
)
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))
which correctly yields:
Epoch 19/20
1/1 [==============================] - 0s 438ms/step - loss: 0.4426 - accuracy: 0.9062 - val_loss: 0.4658 - val_accuracy: 0.8800
Epoch 20/20
1/1 [==============================] - 0s 444ms/step - loss: 2.1619 - accuracy: 0.8438 - val_loss: 0.5886 - val_accuracy: 0.8900
4/4 [==============================] - 1s 87ms/step
eval acc: 0.89

there are a few points about your data which causes this:
First, your data is highly imbalanced (8 to 1 label ratio) which makes the model rather overfit and the CV estimate inaccurate.
Second, in the get_dataset function, the shuffle is set to True so every time you call the get_dataset(), it shuffles your data, and because (1) Your validation set is very small and (2) your train/val split is not stratified over your labels, the validation metrics would vary a lot due to this shuffling.
Suggestions to solve this:
call the get_dataset() only once for train and val dataset before fitting the model and save them as variables. and if there is no sequential order in your data, maybe set shuffle=False.
(optional) If possible make your dataset more balanced by techniques such as data augmentation, over-/under-sampling, etc.
def get_dataset(subset):
return image_dataset_from_directory(
'data-nodup-1000',
labels="inferred",
label_mode='binary',
color_mode="rgb",
image_size=(224, 224),
shuffle=False,
seed=0,
validation_split=0.1,
subset=subset,
crop_to_aspect_ratio=False,
)
train_dataset = get_dataset('training')
val_dataset = get_dataset('validation')
model.fit(
train_dataset,
steps_per_epoch=1,
epochs=20,
validation_data=val_dataset,
)
true_class = tf.concat([y for x, y in val_dataset], axis=0)
pred = model.predict(val_dataset)
pred_class = pred >= .5
print('eval acc:', accuracy_score(true_class, pred_class))

Related

Accuracy is not increasing in images classification when using TensorFlow model

At my job Interview yesterday, I was asked to build a neural network using TesnorFlow in python to classify images from the flowers images dataset.
But even though it should've worked theoretically, for some reason I couldn't increase the accuracy above 20s%.
Python Version: 3.8.13
TensorFlow Versioin: 2.4.1
The data preprocessing methods from the interviewer were given as follows:
# create datase
IMG_SIZE = 160
BATCH_SIZE = 32
AUTOTUNE = tf.data.experimental.AUTOTUNE
def _parse_data(x,y):
image = tf.io.read_file(x)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.cast(image, dtype=tf.float32)
image = tf.math.l2_normalize(image)
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image,y
def _input_fn(x,y):
ds = tf.data.Dataset.from_tensor_slices((x,y))
ds = ds.map(_parse_data)
ds = ds.shuffle(buffer_size=data_size)
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
train_ds = _input_fn(x_train, y_train)
validation_ds = _input_fn(x_valid, y_valid)
With both training and validation datasets being
<PrefetchDataset shapes: ((None, 160, 160, 3), (None,)), types:
(tf.float32, tf.int32)>
With the network being as follows:
from tensorflow.keras import datasets, layers, models
model_seq = models.Sequential()
model_seq.add(layers.experimental.preprocessing.RandomFlip("horizontal",input_shape=(IMG_SIZE,IMG_SIZE,3)))
model_seq.add(layers.experimental.preprocessing.RandomRotation(0.2))
model_seq.add(layers.experimental.preprocessing.Rescaling(1./255))
model_seq.add(layers.Conv2D(16, 3, padding='same', activation='relu'))
model_seq.add(layers.MaxPooling2D())
model_seq.add(layers.Conv2D(32, 3, padding='same', activation='relu'))
model_seq.add(layers.MaxPooling2D())
model_seq.add(layers.Conv2D(64, 3, padding='same', activation='relu'))
model_seq.add(layers.MaxPooling2D())
model_seq.add(layers.Dropout(0.2))
model_seq.add(layers.Flatten())
model_seq.add(layers.Dense(128, activation='relu'))
model_seq.add(layers.Dense(len(label_names), activation='softmax'))
model_seq.summary()
The output layer being the only thing that isn't allowed to be change.
model_seq.add(layers.Dense(len(label_names), activation='softmax'))
(Please note I was for some reason asked to use model_seq.add(), and even though it could be triggering for some of you, please ignore it this once :) )
For compiling the model, I used the following:
model_seq.compile(optimizer="Adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
And for fitting the model:
history = model_seq.fit(train_ds,epochs=20,
validation_data = validation_ds,
steps_per_epoch=100,validation_steps=100)
The things I've tried:
Using different Augmentation methods (or removing the whole section
from the network).
Changing the Batch and Image sizes.
Using Dropout layers.
Using early stopping as follows:
callback = tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
min_delta=0, patience=3, verbose=0,
mode='auto',baseline=None,
restore_best_weights=True)
history = model_seq.fit(train_ds,epochs=20,
validation_data = validation_ds,
steps_per_epoch=100,validation_steps=100,
callbacks = [callback])
Yet despite all of the above, I couldn't get any results. Since I couldn't find out what I did wrong exactly, I'm hoping someone here could tell me, so I could learn from this experience.
(Please take into consideration that I wasn't allowed to change the preprocessing functions, with the parameters IMG_SIZE and BATCH_SIZE being the only exception).

“TLDR: If you want to use their preprocessing part and don't change anything go to First Approach. If you want to augment images, Go to Second Approach and use ImageDataGenerator”.
First Approach
As you say: I had to use the preprocessing functions and don't change this and because I don't access your data, I use cifar10 dataset and use your preprocessing part. (only line of reading from file changed).
Because you shouldn't change IMG_SIZE=160 in the preprocessing part, I add this layer : tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (32, 32)))) to the network because working with large images causes a crash.
You don't need a very large network, first check with a small network then step by step add parameters then add layers.
We can get a better result like the below: (On cifar10 with your network I get 10% accuracy like you.)
import tensorflow as tf
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
# create datase
IMG_SIZE = 160
BATCH_SIZE = 32
data_size = 32
AUTOTUNE = tf.data.experimental.AUTOTUNE
def _parse_data(x,y):
# image = tf.io.read_file(x) <- because don't read from path
image = x
# image = tf.image.decode_jpeg(image, channels=3) <- because don't read from path and don't have jpeg
image = tf.cast(image, dtype=tf.float32)
image = tf.math.l2_normalize(image)
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image,y
def _input_fn(x,y):
ds = tf.data.Dataset.from_tensor_slices((x,y))
ds = ds.map(_parse_data)
ds = ds.shuffle(buffer_size=data_size)
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
train_ds = _input_fn(X_train, y_train)
validation_ds = _input_fn(X_test, y_test)
model = tf.keras.Sequential([
tf.keras.Input(shape=(160, 160, 3)),
tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (32, 32))),
tf.keras.layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(50,activation='relu'),
tf.keras.layers.Dense(10,activation='softmax')
])
model.compile(optimizer="Adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
history = model.fit(train_ds,epochs=4,
validation_data = validation_ds,
steps_per_epoch=100,validation_steps=100)
Output:
Epoch 1/4
100/100 [==============================] - 8s 38ms/step - loss: 2.2445 - accuracy: 0.1375 - val_loss: 2.1406 - val_accuracy: 0.2138
Epoch 2/4
100/100 [==============================] - 3s 33ms/step - loss: 2.0552 - accuracy: 0.2688 - val_loss: 1.9764 - val_accuracy: 0.3250
Epoch 3/4
100/100 [==============================] - 4s 38ms/step - loss: 1.9468 - accuracy: 0.3022 - val_loss: 1.9014 - val_accuracy: 0.3200
Epoch 4/4
100/100 [==============================] - 4s 36ms/step - loss: 1.8936 - accuracy: 0.3341 - val_loss: 1.8883 - val_accuracy: 0.3419
Second Approach: Augment images with ImageDataGenerator:
import tensorflow as tf
flowers = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
data_gen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
rotation_range = 10, # Degree range for random rotations.
horizontal_flip = True, # Randomly flip inputs horizontally.
vertical_flip = True, # Randomly flip inputs vertically.
)
imgs_dataset = data_gen.flow_from_directory(flowers, class_mode='categorical',
target_size=(160, 160), batch_size=32,
shuffle=True)
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=32,kernel_size=(3,3),input_shape=(160, 160, 3),activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(50,activation='relu'),
tf.keras.layers.Dense(5,activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(imgs_dataset,epochs=5)
Output:
Found 3670 images belonging to 5 classes.
Epoch 1/5
115/115 [==============================] - 39s 311ms/step - loss: 1.7904 - accuracy: 0.4161
Epoch 2/5
115/115 [==============================] - 27s 236ms/step - loss: 1.0878 - accuracy: 0.5605
Epoch 3/5
115/115 [==============================] - 28s 244ms/step - loss: 1.0252 - accuracy: 0.6005
Epoch 4/5
115/115 [==============================] - 27s 233ms/step - loss: 0.9735 - accuracy: 0.6196
Epoch 5/5
115/115 [==============================] - 29s 248ms/step - loss: 0.9313 - accuracy: 0.6455

Model training with tf.data.Dataset and NumPy arrays yields different results

I use the Keras model training API and observed differences when training the model with NumPy arrays (x_train and y_train) and with tf.data.Dataset.from_tensor_slices((x_train, y_train)). A minimal working example is shown below:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = True
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))
If I run this code with use_dataset = True, the final performance is:
Evaluation:
4/4 [==============================] - 0s 825us/step - loss: 0.4132
7/7 [==============================] - 0s 701us/step - loss: 0.4132
If I run it with use_dataset = False, I get:
Evaluation:
4/4 [==============================] - 0s 855us/step - loss: 0.4219
7/7 [==============================] - 0s 808us/step - loss: 0.4219
I expected that the two training loops would perform identically. Interestingly, the model performance is identical if I set batch_size = n_examples. The difference seems to be related with the way that batches are handled internally. Why is this happening? Is it a bug or a feature?

The behavior is due to the default parameter shuffle=True in model.fit(*) and not a bug. According to the docs regarding shuffle:
Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). This argument is ignored when x is a generator or an object of tf.data.Dataset. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
So this parameter is ignored when a tf.data.Dataset is passed, and the data is not reshuffled after each epoch as in the other approach with arrays.
Here is the code to get the same results for both methods:
import numpy as np
import tensorflow as tf
tf.keras.utils.set_random_seed(0)
n_examples, n_dims = (100, 10)
raw_dataset = np.random.randn(n_examples, n_dims)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(
1024, activation="relu", use_bias=True
),
tf.keras.layers.Dense(
1, activation="linear", use_bias=True
),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss="mse",
)
x_train = raw_dataset[:, :-1]
y_train = raw_dataset[:, -1]
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
n_epochs = 10
batch_size = 16
use_dataset = False
if use_dataset:
model.fit(
dataset.batch(batch_size=batch_size),
epochs=n_epochs,
)
else:
model.fit(
x=x_train,
y=y_train,
batch_size=batch_size,
shuffle=False,
epochs=n_epochs,
)
print("Evaluation:")
model.evaluate(x_train, y_train)
model.evaluate(dataset.batch(batch_size=batch_size))

Calculating Fscore for each epoch using keras (not batch-wise)

Essence of this question:
I'd like to find a proper way to calculate the Fscore for the validation and training data after each epoch (not batch-wise)
For a binary classification task, I'd like to calculate the Fscore after each epoch using a simple keras model. But how to calculate the Fscore seems quite the discussion.
I know keras works in batches and one way to calculate the fscore for each batch would be https://stackoverflow.com/a/45305384/10053244 (Fscore-calculation: f1).
The batch-wise calculation can be quite confusing and I prefer to calculate Fscore after each epoch. So just calling history.history['f1'] or history.history['val_f1'] does not do the trick, cause it shows the batch-wise fscores.
I figured one way is to save each model using the
from keras.callbacks import ModelCheckpoint function:
Saving each model-weights after every epoch
Reloading the model and using model.evaluate or model.predict
Edit:
Using tensorflow backend, I decided to track TruePositives, FalsePositives and FalseNegatives(as umbreon29 suggested).
But now comes the fun part: The results when reloading the model are different for the training data (TP, FP, FN are different) but not for the validation set!
So a simple model storing the weights to rebuild each model and recalculate the TP,FN,TP (and finally the Fscore) looks like:
from keras.metrics import TruePositives, TrueNegatives, FalseNegatives, FalsePositives
## simple keras model
sequence_input = Input(shape=(input_dim,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=[TruePositives(name='true_positives'),
TrueNegatives(name='true_negatives'),
FalseNegatives(name='false_negatives'),
FalsePositives(name='false_positives'),
f1])
# model checkpoints
filepath="weights-improvement-{epoch:02d}-{val_f1:.2f}.hdf5"
checkpoint = ModelCheckpoint(os.path.join(savemodel,filepath), monitor='val_f1', verbose=1, save_best_only=False, save_weights_only=True, mode='auto')
callbacks_list = [checkpoint]
history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=epoch, batch_size=batch,
callbacks=[callbacks_list])
## Saving TP, FN, FP to calculate Fscore
tp.append(history.history['true_positives'])
fp.append(history.history['false_positives'])
fn.append(history.history['false_negatives'])
arr_train = np.stack((tp, fp, fn), axis=1)
## doing the same for tp_val, fp_val, fn_val
[...]
arr_val = np.stack((tp_val, fp_val, fn_val), axis=1)
## following method just showes batch-wise fscores and shouldnt be used:
## f1_sc.append(history.history['f1'])
Reloading the model after each epoch to calculate the Fscores (The predict method with sklearn fscore metric from sklearn.metrics import f1_score is equivalent to the calculating fscore metric from TP,FP, FN):
Fscore_val = []
fscorepredict_val_sklearn = []
Fscore_train = []
fscorepredict_train = []
## model_loads contains list of model-paths
for i in model_loads:
## rebuilding the model each time since only weights are stored
sequence_input = Input(shape=(input_dim,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
model.load_weights(i)
# Compile model (required to make predictions)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=[TruePositives(name='true_positives'),
TrueNegatives(name='true_negatives'),
FalseNegatives(name='false_negatives'),
FalsePositives(name='false_positives'),
f1
])
### For Validation data
## using evaluate
y_pred = model.evaluate(x_val, y_val, verbose=0)
Fscore_val.append(y_pred) ## contains (loss,tp,fp,fn, f1-batchwise)
## using predict
y_pred = model.predict(x_val)
val_preds = [1 if x > 0.5 else 0 for x in y_pred]
cm = f1_score(y_val, val_preds)
fscorepredict_val_sklearn.append(cm) ## equivalent to Fscore calculated from Fscore_vals tp,fp, fn
### For the training data
y_pred = model.evaluate(x_train, y_train, verbose=0)
Fscore_train.append(y_pred) ## also contains (loss,tp,fp,fn, f1-batchwise)
y_pred = model.predict(x_train, verbose=0) # gives probabilities
train_preds = [1 if x > 0.5 else 0 for x in y_pred]
cm = f1_score(y_train, train_preds)
fscorepredict_train.append(cm)
Calculating the Fscore from the tp,fn, and fp using Fscore_val's tp,fn,fp and comparing it tofscorepredict_val_sklearn is equivalent and identical to calculating it from arr_val.
However, the number of tp,fn, and fp is different when comparing Fscore_train and arr_train. Therefore, I also arrive at different Fscores. The number of tp,fn,fp should be the same but they arent.. Is this a bug?
Which one should I trust? The fscorepredict_train seem actually more trustworthy, since they start above the "always guessing class 1"-Fscore (when recall=1). (fscorepredict_train[0]=0.6784 vs f_hist[0]=0.5736 vs always-guessing-class-1-fscore = 0.6751)
[Note: Fscore_train[0] = [0.6853608025386962, 2220.0, 250.0, 111.0, 1993.0, 0.6730511784553528] (loss,tp,tn,fp,fn) leading to fscore= 0.6784 , so Fscore from Fscore_train = fscorepredict_train ]

I provide a custom callback that computes the score (in your case F1 from sklearn) on ALL the data at the end of the epoch (for train and optionally validation)
class F1History(tf.keras.callbacks.Callback):
def __init__(self, train, validation=None):
super(F1History, self).__init__()
self.validation = validation
self.train = train
def on_epoch_end(self, epoch, logs={}):
logs['F1_score_train'] = float('-inf')
X_train, y_train = self.train[0], self.train[1]
y_pred = (self.model.predict(X_train).ravel()>0.5)+0
score = f1_score(y_train, y_pred)
if (self.validation):
logs['F1_score_val'] = float('-inf')
X_valid, y_valid = self.validation[0], self.validation[1]
y_val_pred = (self.model.predict(X_valid).ravel()>0.5)+0
val_score = f1_score(y_valid, y_val_pred)
logs['F1_score_train'] = np.round(score, 5)
logs['F1_score_val'] = np.round(val_score, 5)
else:
logs['F1_score_train'] = np.round(score, 5)
here a dummy example:
x_train = np.random.uniform(0,1, (30,10))
y_train = np.random.randint(0,2, (30))
x_val = np.random.uniform(0,1, (20,10))
y_val = np.random.randint(0,2, (20))
sequence_input = Input(shape=(10,), dtype='float32')
preds = Dense(1, activation='sigmoid',name='output')(sequence_input)
model = Model(sequence_input, preds)
es = EarlyStopping(patience=3, verbose=1, min_delta=0.001, monitor='F1_score_val', mode='max', restore_best_weights=True)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(x_train,y_train, epochs=10,
callbacks=[F1History(train=(x_train,y_train),validation=(x_val,y_val)),es])
the output print:
Epoch 1/10
1/1 [==============================] - 0s 78ms/step - loss: 0.7453 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 2/10
1/1 [==============================] - 0s 57ms/step - loss: 0.7448 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 3/10
1/1 [==============================] - 0s 58ms/step - loss: 0.7444 - F1_score_train: 0.3478 - F1_score_val: 0.4762
Epoch 4/10
1/1 [==============================] - ETA: 0s - loss: 0.7439Restoring model weights from the end of the best epoch.
1/1 [==============================] - 0s 70ms/step - loss: 0.7439 - F1_score_train: 0.3478 - F1_score_val: 0.4762
I have TF 2.2 and works without problems, I hope this help

Keras Batchnormalization and sample weights

I am trying the the training and evaluation example on the tensorflow website.
Specifically, this part:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
y_train = y_train.astype('float32')
y_test = y_test.astype('float32')
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model = get_compiled_model()
model.fit(train_dataset, epochs=3)
It appears that if I add the batch normalization layer (this line: x = layers.BatchNormalization()(x)) I get the following error:
InvalidArgumentError: The second input must be a scalar, but it has shape [64]
[[{{node batch_normalization_2/cond/ReadVariableOp/Switch}}]]
Any ideas?

The same code works for me.
The only lines I changed are :
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3)
to model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3)
(which is version specific)
Then
model.fit(train_dataset, epochs=3) to model.fit(train_dataset, epochs=3, steps_per_epoch=30)
Reason : When using iterators as input to a model, you should specify the steps_per_epoch argument

If you just want to use sample weights, you don't have to use tf.data.Dataset, you can simply run:
model.fit(x=x_train, y=y_train, sample_weight=sample_weight, batch_size=64, epochs=3)
and it works for me (when I change learning_rate to lr as #ASHu2 mentioned).
It gets 97% accuracy after 3 epochs:
...
57408/60000 [===========================>..] - ETA: 0s - loss: 0.1010 - sparse_categorical_accuracy: 0.9709
58816/60000 [============================>.] - ETA: 0s - loss: 0.1011 - sparse_categorical_accuracy: 0.9708
60000/60000 [==============================] - 2s 37us/sample - loss: 0.1007 - sparse_categorical_accuracy: 0.9709
I used TF 1.14.0 on windows.

The problem was solved when I updated tensorflow from version 1.14.1 to 2.0.0-rc1.

Training loss higher than validation loss

I am trying to train a regression model of a dummy function with 3 variables with fully connected neural nets in Keras and I always get a training loss much higher than the validation loss.
I split the data set in 2/3 for training and 1/3 for validation. I have tried lots of different things:
changing the architecture
adding more neurons
using regularization
using different batch sizes
Still the training error is one order of magnitude higer than the validation error:
Epoch 5995/6000
4020/4020 [==============================] - 0s 78us/step - loss: 1.2446e-04 - mean_squared_error: 1.2446e-04 - val_loss: 1.3953e-05 - val_mean_squared_error: 1.3953e-05
Epoch 5996/6000
4020/4020 [==============================] - 0s 98us/step - loss: 1.2549e-04 - mean_squared_error: 1.2549e-04 - val_loss: 1.5730e-05 - val_mean_squared_error: 1.5730e-05
Epoch 5997/6000
4020/4020 [==============================] - 0s 105us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4372e-05 - val_mean_squared_error: 1.4372e-05
Epoch 5998/6000
4020/4020 [==============================] - 0s 96us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4151e-05 - val_mean_squared_error: 1.4151e-05
Epoch 5999/6000
4020/4020 [==============================] - 0s 80us/step - loss: 1.2487e-04 - mean_squared_error: 1.2487e-04 - val_loss: 1.4342e-05 - val_mean_squared_error: 1.4342e-05
Epoch 6000/6000
4020/4020 [==============================] - 0s 79us/step - loss: 1.2494e-04 - mean_squared_error: 1.2494e-04 - val_loss: 1.4769e-05 - val_mean_squared_error: 1.4769e-05
This makes no sense, please help!
Edit: this is the full code
I have 6000 training examples
# -*- coding: utf-8 -*-
"""
Created on Mon Feb 26 13:40:03 2018
#author: Michele
"""
#from keras.datasets import reuters
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras import optimizers
import matplotlib.pyplot as plt
import os
import pylab
from keras.constraints import maxnorm
from sklearn.model_selection import train_test_split
from keras import regularizers
from sklearn.preprocessing import MinMaxScaler
import math
from sklearn.metrics import mean_squared_error
import keras
# fix random seed for reproducibility
seed=7
np.random.seed(seed)
dataset = np.loadtxt("BabbaX.csv", delimiter=",")
#split into input (X) and output (Y) variables
#x = dataset.transpose()[:,10:15] #only use power
x = dataset
del(dataset) # delete container
dataset = np.loadtxt("BabbaY.csv", delimiter=",")
#split into input (X) and output (Y) variables
y = dataset.transpose()
del(dataset) # delete container
#scale labels from 0 to 1
scaler = MinMaxScaler(feature_range=(0, 1))
y = np.reshape(y, (y.shape[0],1))
y = scaler.fit_transform(y)
lenData=x.shape[0]
x=np.transpose(x)
xtrain=x[:,0:round(lenData*0.67)]
ytrain=y[0:round(lenData*0.67),]
xtest=x[:,round(lenData*0.67):round(lenData*1.0)]
ytest=y[round(lenData*0.67):round(lenData*1.0)]
xtrain=np.transpose(xtrain)
xtest=np.transpose(xtest)
l2_lambda = 0.1 #reg factor
#sequential type of model
model = Sequential()
#stacking layers with .add
units=300
#model.add(Dense(units, input_dim=xtest.shape[1], activation='relu', kernel_initializer='normal', kernel_regularizer=regularizers.l2(l2_lambda), kernel_constraint=maxnorm(3)))
model.add(Dense(units, activation='relu', input_dim=xtest.shape[1]))
#model.add(Dropout(0.1))
model.add(Dense(units, activation='relu'))
#model.add(Dropout(0.1))
model.add(Dense(1)) #no activation function should be used for the output layer
rms = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0) #It is recommended to leave the parameters
adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-6, amsgrad=False)
#of this optimizer at their default values (except the learning rate, which can be freely tuned).
#adam=keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
#configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=1000, batch_size=round(xtest.shape[0]/100),
validation_data=(xtest, ytest), shuffle=True, verbose=2)
#extract weights for each layer
weights = [layer.get_weights() for layer in model.layers]
#evaluate on training data set
valuesTrain=model.predict(xtrain)
#evaluate on test data set
valuesTest=model.predict(xtest)
#invert predictions
valuesTrain = scaler.inverse_transform(valuesTrain)
ytrain = scaler.inverse_transform(ytrain)
valuesTest = scaler.inverse_transform(valuesTest)
ytest = scaler.inverse_transform(ytest)

TL;DR:
When a model is learning well and quickly the validation loss can be lower than the training loss, since the validation happens on the updated model, while the training loss did not have any (no batches) or only some (with batches) of the updates applied.
Okay I think I found out what's happening here. I used the following code to test this.
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
np.random.seed(7)
N_DATA = 6000
x = np.random.uniform(-10, 10, (3, N_DATA))
y = x[0] + x[1]**2 + x[2]**3
xtrain = x[:, 0:round(N_DATA*0.67)]
ytrain = y[0:round(N_DATA*0.67)]
xtest = x[:, round(N_DATA*0.67):N_DATA]
ytest = y[round(N_DATA*0.67):N_DATA]
xtrain = np.transpose(xtrain)
xtest = np.transpose(xtest)
model = Sequential()
model.add(Dense(10, activation='relu', input_dim=3))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
adam = keras.optimizers.Adam()
# configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
plt.plot(history.history['mean_squared_error'])
plt.plot(history.history['val_loss'])
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
This is essentially the same as your code and replicates the problem, which is not actually a problem. Simply change
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
to
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtrain, ytrain), shuffle=False, verbose=2)
So instead of validating with your validation data you validate using the training data again, which leads to exactly the same behavior. Weird isn't it? No actually not. What I think is happening is:
The initial mean_squared_error given by Keras on every epoch is the loss before the gradients have been applied, while the validation happens after the gradients have been applied, which makes sense.
With highly stochastic problems for which NNs are usually used you do not see that, because the data varies so much that the updated weights simply are not good enough to describe the validation data, the slight overfitting effect on the training data is still so much stronger that even after updating the weights the validation loss is still higher than the training loss from before. That is only how I think it is though, I might be completely wrong.

One of the reasons that I think is maybe you can increase the size of training data and lower the size of validation data. Then your model will be trained on more samples which may include some complex samples as well and then can be validated on the remaining samples. Try something like train-80% and Validation-20% or any other numbers a little higher than what you used previously.
If you don't want to change the size of training and validation sets, then you can try changing the random seed value to some other number so that you will get a training set with different samples which might be helpful in training the model well.
Check this answer here to get more understanding of the other possible reasons.
Check this link if you want a more detailed explanation with an example. #Michele

If training loss is a little higher or nearer to validation loss, it mean that model is not overfitting.
Efforts are always there to use best out of features to have less overfitting and better validation and test accuracies.
Probable reason that you are always getting train loss higher can be the features and data you are using to train.
Please refer following link and observe the training and validation loss in case of dropout:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Accuracy of same validation dataset differs between last epoch and after fit - python

Related

Accuracy is not increasing in images classification when using TensorFlow model

Model training with tf.data.Dataset and NumPy arrays yields different results

Calculating Fscore for each epoch using keras (not batch-wise)

Keras Batchnormalization and sample weights

Training loss higher than validation loss

Categories

Resources