Related
I'm training a ResNet model to classify car brands.
I saved the weights during training for every epoch.
For a test, I stopped the training at epoch 3.
# checkpoint = ModelCheckpoint("best_model.hdf5", monitor='loss', verbose=1)
checkpoint_path = "weights/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1,
# Save weights, every epoch.
save_freq='epoch')
model.save_weights(checkpoint_path.format(epoch=0))
history = model.fit_generator(
training_set,
validation_data = test_set,
epochs = 50,
steps_per_epoch = len(training_set),
validation_steps = len(test_set),
callbacks = [cp_callback]
)
However, when loading them, I am unsure if it is resuming from the last epoch saved one since it says epoch 1/50 again. Below is the code I use to load the last saved model.
from keras.models import Sequential, load_model
# load the model
new_model = load_model('./weights/cp-0003.ckpt')
# fit the model
history = new_model.fit_generator(
training_set,
validation_data = test_set,
epochs = 50,
steps_per_epoch = len(training_set),
validation_steps = len(test_set),
callbacks = [cp_callback]
)
This is what it looks like:
Image showing that running the saved weight starts from epoch 1/50 again
Can someone please help?
You can use the initial_epoch argument of the fit_generator. By default, it is set to 0 but you can set it to any positive number:
from keras.models import Sequential, load_model
import tensorflow as tf
checkpoint_path = "weights/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1,
# Save weights, every epoch.
save_freq='epoch')
model.save_weights(checkpoint_path.format(epoch=0))
history = model.fit_generator(
training_set,
validation_data=test_set,
epochs=3,
steps_per_epoch=len(training_set),
validation_steps=len(test_set),
callbacks = [cp_callback]
)
new_model = load_model('./weights/cp-0003.ckpt')
# fit the model
history = new_model.fit_generator(
training_set,
validation_data=test_set,
epochs=50,
steps_per_epoch=len(training_set),
validation_steps=len(test_set),
callbacks=[cp_callback],
initial_epoch=3
)
This will train your model for 50 - 3 = 47 additional epochs.
Some remarks regarding your code if you use Tensorflow 2.X:
fit_generator is deprecated since fit supports generator now
you should replace your import from keras.... to from tensorflow.keras...
As far as I understand, model.fit(epochs=NUM_EPOCHS) does not reset metrics for each epoch. My code for metrics and model.fit() looks like this (simplified):
import tensorflow as tf
from tensorflow.keras import applications
NUM_CLASSES = 4
INPUT_SHAPE = (256, 256, 3)
MODELS = {
'DenseNet121': applications.DenseNet121,
'DenseNet169': applications.DenseNet169
}
REDUCE_LR_PATIENCE = 2
REDUCE_LR_FACTOR = 0.7
EARLY_STOPPING_PATIENCE = 4
for modelName, model in MODELS.items():
loadedModel = model(include_top=False, weights='imagenet',
pooling='avg', input_shape=INPUT_SHAPE)
sequentialModel = tf.keras.models.Sequential()
sequentialModel.add(loadedModel)
sequentialModel.add(tf.keras.layers.Dense(NUM_CLASSES, activation='softmax'))
aucCurve = tf.keras.metrics.AUC(curve = 'ROC', multi_label = True)
categoricalAccuracy = tf.keras.metrics.CategoricalAccuracy()
F1Score = tfa.metrics.F1Score(num_classes = NUM_CLASSES, average = 'macro', threshold = None)
metrics = [aucCurve, categoricalAccuracy, F1Score]
sequentialModel.compile(metrics=metrics)
callbacks = [
tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=REDUCE_LR_PATIENCE, verbose=1, factor=REDUCE_LR_FACTOR),
tf.keras.callbacks.EarlyStopping(monitor='val_loss', verbose=1, patience=EARLY_STOPPING_PATIENCE),
tf.keras.callbacks.ModelCheckpoint(filepath=modelName + '_epoch-{epoch:02d}.h5', monitor='val_loss', save_best_only=False, verbose=1),
tf.keras.callbacks.CSVLogger(modelName + '_training.csv')]
sequentialModel.fit(epochs=NUM_EPOCHS)
Perhaps I can reset metrics by doing a for loop in range of NUM_EPOCHS and initialize the metrics in a for loop, but I am not sure if it is a good solution. Also, I have ModelCheckpoint and CSVLogger callbacks, which require an epoch number from model.fit(), so it won't really work if I do a for loop.
Do you have any suggestions on how to reset metrics for each epoch? Is doing a for loop in range of NUM_EPOCHS the only solution here? Thank you.
No, metrics are calculated per epoch. They are not averaged over the epochs but they are rather averaged over the batches per epoch. You see that the metrics keep improving epoch after epoch because your model is getting trained.
I am building a CNN in Keras using a Tensorflow backend for speaker identification, and currently I am attempting to train the model and then save it in as an .hdf5 file. The program trains the model for 100 epochs with early stopping and checkpoints, saving only the best model to a file, as illustrated in the code below:
class BuildModel:
# Create First Model in Ensemble
def createModel(self, model_input, n_outputs, first_session=True):
if first_session != True:
model = load_model('SI_ideal_model_fixed.hdf5')
return model
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv = Dense(64, activation='relu')(conv)
conv = Dense(16, activation='relu')(conv)
conv = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv)
outputs = Dense(n_outputs, activation='softmax')(conv)
# Create Model
model = Model(inputs, outputs)
model.summary()
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes,
first_session=first_session)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.hdf5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
mode='max') # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=epochs,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
accuracy = model.evaluate(x=x_train, y=y_train,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model = load_model('SI_ideal_model_fixed.hdf5')
return (accuracy[1], model)
However, it appears that the load_model function is not working properly since the model achieved a validation accuracy of 0.56193 after the first training session but then only started with a validation accuracy of 0.2508 at the beginning of the second training session. (From what I have seen, the first epoch of the second training session should have a validation accuracy much closer to the that of the best model.)
Moreover, I then attempted to test the trained model on a set of unseen samples with model.predict, and it failed on all six, often with high probabilities, which leads me to believe that it was using minimally trained (or untrained) weights.
So, my question is could this be an issue from loading and saving the models using the load_model and ModelCheckpoint functions? If so, what is the best alternative method? If not, what are some good troubleshooting tips for improving the model's prediction functionality?
I am not sure what you mean by training session. What I would do is first train for a few epochs epochs and note the validation accuracy. Then, load the model and use evaluate() to get the same accuracy. If it differs, then yes something is wrong with your loading. Here is what I would do:
def createModel(self, model_input, n_outputs):
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv2 = Dense(64, activation='relu')(conv)
conv3 = Dense(16, activation='relu')(conv2)
conv4 = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv3)
outputs = Dense(n_outputs, activation='softmax')(conv4)
# Create Model
model = Model(inputs, outputs)
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.h5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
save_weights_only=False) # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=5,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
model.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model2 = load_model('SI_ideal_model_fixed.h5')
model2.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
return (accuracy[1], model)
The two evaluations should print the same thing really.
P.S. TF might change the order of your computations so I used different names to prevent that in the model e.g. conv1, conv2 ...)
I load the Keras model I have been training with 150 epochs
tbCallBack = tensorflow.keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
my_model.fit(X_train, X_train,
epochs=200,
batch_size=100,
shuffle=True,
validation_data = (X_test, X_test),
callbacks=[tbCallBack]
)
# Save the model
my_model.save('my_model.hdf5')
Then, I will load the Keras model
my_model = load_model("my_model.hdf5")
Is there a way to load all the epochs logs (loss, accuracy.. ) ?
You can use the keras callback called CSVLogger.
According to the documentation, it streams the results from each epoch into a csv file.
This is the code from the documentation of it.
from keras.callbacks import CSVLogger
csv_logger = CSVLogger('training.log')
model.fit(X_train, Y_train, callbacks=[csv_logger])
You can then manipulate it as a normal CSV file, for your needs.
I am having trouble fine tuning an Inception model with Keras.
I have managed to use tutorials and documentation to generate a model of fully connected top layers that classifies my dataset into their proper categories with an accuracy over 99% using bottleneck features from Inception.
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
# dimensions of our images.
img_width, img_height = 150, 150
#paths for saving weights and finding datasets
top_model_weights_path = 'Inception_fc_model_v0.h5'
train_data_dir = '../data/train2'
validation_data_dir = '../data/train2'
#training related parameters?
inclusive_images = 1424
nb_train_samples = 1424
nb_validation_samples = 1424
epochs = 50
batch_size = 16
def save_bottlebeck_features():
datagen = ImageDataGenerator(rescale=1. / 255)
# build bottleneck features
model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=(img_width,img_height,3))
generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
bottleneck_features_train = model.predict_generator(
generator, nb_train_samples // batch_size)
np.save('bottleneck_features_train', bottleneck_features_train)
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
bottleneck_features_validation = model.predict_generator(
generator, nb_validation_samples // batch_size)
np.save('bottleneck_features_validation', bottleneck_features_validation)
def train_top_model():
train_data = np.load('bottleneck_features_train.npy')
train_labels = np.array(range(inclusive_images))
validation_data = np.load('bottleneck_features_validation.npy')
validation_labels = np.array(range(inclusive_images))
print('base size ', train_data.shape[1:])
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(1000, activation='relu'))
model.add(Dense(inclusive_images, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
proceed = True
#model.load_weights(top_model_weights_path)
while proceed:
history = model.fit(train_data, train_labels,
epochs=epochs,
batch_size=batch_size)#,
#validation_data=(validation_data, validation_labels), verbose=1)
if history.history['acc'][-1] > .99:
proceed = False
model.save_weights(top_model_weights_path)
save_bottlebeck_features()
train_top_model()
Epoch 50/50
1424/1424 [==============================] - 17s 12ms/step - loss: 0.0398 - acc: 0.9909
I have also been able to stack this model on top of inception to create my full model and use that full model to successfully classify my training set.
from keras import Model
from keras import optimizers
from keras.callbacks import EarlyStopping
img_width, img_height = 150, 150
top_model_weights_path = 'Inception_fc_model_v0.h5'
train_data_dir = '../data/train2'
validation_data_dir = '../data/train2'
#how many inclusive examples do we have?
inclusive_images = 1424
nb_train_samples = 1424
nb_validation_samples = 1424
epochs = 50
batch_size = 16
# build the complete network for evaluation
base_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=False, input_shape=(img_width,img_height,3))
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(1000, activation='relu'))
top_model.add(Dense(inclusive_images, activation='softmax'))
top_model.load_weights(top_model_weights_path)
#combine base and top model
fullModel = Model(input= base_model.input, output= top_model(base_model.output))
#predict with the full training dataset
results = fullModel.predict_generator(ImageDataGenerator(rescale=1. / 255).flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False))
inspection of the results from processing on this full model match the accuracy of the bottleneck generated fully connected model.
import matplotlib.pyplot as plt
import operator
#retrieve what the softmax based class assignments would be from results
resultMaxClassIDs = [ max(enumerate(result), key=operator.itemgetter(1))[0] for result in results]
#resultMaxClassIDs should be equal to range(inclusive_images) so we subtract the two and plot the log of the absolute value
#looking for spikes that indicate the values aren't equal
plt.plot([np.log(np.abs(x)+10) for x in (np.array(resultMaxClassIDs) - np.array(range(inclusive_images)))])
Here is the problem:
When I take this full model and attempt to train it, Accuracy drops to 0 even though validation remains above 99%.
model2 = fullModel
for layer in model2.layers[:-2]:
layer.trainable = False
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
#model.compile(loss='binary_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), metrics=['accuracy'])
model2.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
train_datagen = ImageDataGenerator(rescale=1. / 255)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
callback = [EarlyStopping(monitor='acc', min_delta=0, patience=3, verbose=0, mode='auto', baseline=None)]
# fine-tune the model
model2.fit_generator(
#train_generator,
validation_generator,
steps_per_epoch=nb_train_samples//batch_size,
validation_steps = nb_validation_samples//batch_size,
epochs=epochs,
validation_data=validation_generator)
Epoch 1/50
89/89 [==============================] - 388s 4s/step - loss: 13.5787 - acc: 0.0000e+00 - val_loss: 0.0353 - val_acc: 0.9937
and it gets worse as things progress
Epoch 21/50
89/89 [==============================] - 372s 4s/step - loss: 7.3850 - acc: 0.0035 - val_loss: 0.5813 - val_acc: 0.8272
The only thing I could think of is that somehow the training labels are getting improperly assigned on this last train, but I've successfully done this with similar code using VGG16 before.
I have searched over the code trying to find a discrepancy to explain why a model making accurate predictions over 99% of the time drops its training accuracy while maintaining validation accuracy during fine tuning, but I can't figure it out. Any help would be appreciated.
Information about the code and environment:
Things that are going to stand out as weird, but are meant to be that way:
There is only 1 image per class. This NN is intended to classify
objects whose environmental and orientation conditions are
controlled. Their is only one acceptable image for each class
corresponding to the correct environmental and rotational situation.
The test and validation set are the same. This NN is only ever
designed to be used on the classes it is being trained on. The images
it will process will be carbon copies of the class examples. It is my
intent to overfit the model to these classes
I am using:
Windows 10
Python 3.5.6 under Anaconda client 1.6.14
Keras 2.2.2
Tensorflow 1.10.0 as the backend
CUDA 9.0
CuDNN 8.0
I have checked out:
Keras accuracy discrepancy in fine-tuned model
VGG16 Keras fine tuning: low accuracy
Keras: model accuracy drops after reaching 99 percent accuracy and loss 0.01
Keras inception v3 retraining and finetuning error
How to find which version of TensorFlow is installed in my system?
but they appear unrelated.
Note: Since your problem is a bit strange and difficult to debug without having your trained model and dataset, this answer is just a (best) guess after considering many things that may have could go wrong. Please provide your feedback and I will delete this answer if it does not work.
Since the inception_V3 contains BatchNormalization layers, maybe the problem is due to (somehow ambiguous or unexpected) behavior of this layer when you set trainable parameter to False (1, 2, 3, 4).
Now, let's see if this is the root of the problem: as suggested by #fchollet, set the learning phase when defining the model for fine-tuning:
from keras import backend as K
K.set_learning_phase(0)
base_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=False, input_shape=(img_width,img_height,3))
for layer in base_model.layers:
layer.trainable = False
K.set_learning_phase(1)
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(1000, activation='relu'))
top_model.add(Dense(inclusive_images, activation='softmax'))
top_model.load_weights(top_model_weights_path)
#combine base and top model
fullModel = Model(input= base_model.input, output= top_model(base_model.output))
fullModel.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
#####################################################################
# Here, define the generators and then fit the model same as before #
#####################################################################
Side Note: This is not causing any problem in your case, but keep in mind that when you use top_model(base_model.output) the whole Sequential model (i.e. top_model) is stored as one layer of fullModel. You can verify this by either using fullModel.summary() or print(fullModel.layers[-1]). Hence when you used:
for layer in model2.layers[:-2]:
layer.trainable = False
you are actually not freezing the last layer of base_model as well. However, since it is a Concatenate layer, and therefore does not have trainable parameters, no problem occurs and it would behave as you intended.
Like the previous reply, I'll try to share some thoughts to see whether it helps.
There are a couple of things that called my attention (and maybe are worth reviewing). Note: some of them should have given you issues with the separate models as well.
Correct if I'm wrong, but it seems you used sparse_categorical_crossentropy for the first training while you used categorical_crossentropy for the second one. Is it correct? Because I believe they assume labels differently (sparse assumes integers and the other assumes one-hot).
Have you tried to set the layers you added in the end as trainable = True? I know that you have already set the others to trainable = False, but maybe that's something worth checking too.
It seems the data generator is not making use of the default preprocessing function used in Inception v3, which uses a per-mean channel.
Have you tried any experiment using Functional instead of Sequential API?
I hope that helps.