I am running a simple neural network using tensorflow as in the below code. However I don't understand why the loss values decrease significantly to nearly zero after several batches, but the acc values don't really increase?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3327) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 9984
=================================================================
Total params: 9,984
Trainable params: 9,984
Non-trainable params: 0
_________________________________________________________________
None
Train on 480888 samples, validate on 53433 samples
Epoch 1/20
32/480888 [..............................] - ETA: 1092:47:40 - loss: 0.6549 - acc: 0.3125
224/480888 [..............................] - ETA: 156:05:19 - loss: 0.0936 - acc: 0.3393
576/480888 [..............................] - ETA: 60:40:08 - loss: 0.0364 - acc: 0.3490
1024/480888 [..............................] - ETA: 34:06:04 - loss: 0.0205 - acc: 0.3555
1440/480888 [..............................] - ETA: 24:14:01 - loss: 0.0146 - acc: 0.3604
1856/480888 [..............................] - ETA: 18:47:22 - loss: 0.0113 - acc: 0.3594
...
...
20960/480888 [>.............................] - ETA: 1:36:49 - loss: 9.9997e-04 - acc: 0.3468
21440/480888 [>.............................] - ETA: 1:34:34 - loss: 9.7758e-04 - acc: 0.3465
21888/480888 [>.............................] - ETA: 1:32:34 - loss: 9.5757e-04 - acc: 0.3469
22336/480888 [>.............................] - ETA: 1:30:38 - loss: 9.3837e-04 - acc: 0.3476
22784/480888 [>.............................] - ETA: 1:28:47 - loss: 9.1992e-04 - acc: 0.3477
23264/480888 [>.............................] - ETA: 1:26:53 - loss: 9.0094e-04 - acc: 0.3475
23712/480888 [>.............................] - ETA: 1:25:10 - loss: 8.8392e-04 - acc: 0.3479
The above acc value is evaluated on each current batch, is that correct?
scaler = preprocessing.MinMaxScaler()
scalerMaxAbs = preprocessing.MaxAbsScaler()
training_metadata = scaler.fit_transform(training_data[metadata].astype(np.float32))
testing_metadata = scaler.transform(testing_data[metadata].astype(np.float32))
training_scores = scalerMaxAbs.fit_transform(training_data[scores])
testing_scores = scalerMaxAbs.transform(testing_data[scores])
y_train = np_utils.to_categorical(training_data['label'], num_classes=3)
y_test = np_utils.to_categorical(testing_data['label'], num_classes=3)
training_features = np.concatenate((training_metadata, training_scores), axis=1)
testing_features = np.concatenate((testing_metadata, testing_scores), axis=1)
inputs = Input(shape=(training_features.shape[1],), dtype='float32')
hh_layer = Dense(128, activation=tf.nn.relu)(inputs)
dropout = Dropout(0.2)(hh_layer)
output = Dense(3, activation=tf.nn.softmax)(inputs)
model = Model(inputs=inputs, output=output)
print(model.summary())
early_stopping_monitor = EarlyStopping(patience=3)
adam = Adam(lr=0.001)
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(training_features, y_train, epochs=20, validation_split=0.1, callbacks=[early_stopping_monitor])
score = model.evaluate(testing_features, y_test)
Result of 4 Epochs:
480888/480888 [==============================] - 332s 691us/step - loss: 4.3699e-05 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 148us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 148us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 147us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
Final result on test set after 4 epochs with early stopping
loss, acc: [1.1920930376163597e-07, 0.34758880897839645]
Step0: save the model, load it then and check its performance on a few samples to make sure if there's any bug in computing loss or acc. If you find nothing wrong, update the question then.
Related
I am new to Keras and have been practicing with resources from the web. Unfortunately, I cannot build a model without it throwing the following error:
ValueError: logits and labels must have the same shape, received ((None, 10) vs (None, 1)).
I have attempted the following:
DF = pd.read_csv("https://raw.githubusercontent.com/EpistasisLab/tpot/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope%20Data.csv")
X = DF.iloc[:,0:-1]
y = DF.iloc[:,-1]
yBin = np.array([1 if x == 'g' else 0 for x in y ])
scaler = StandardScaler()
X1 = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X1, yBin, test_size=0.25, random_state=2018)
print(X_train.__class__,X_test.__class__,y_train.__class__,y_test.__class__ )
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(10,activation="softmax"))
model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=X_train,
y=y_train,
epochs=600,
validation_data=(X_test, y_test), verbose=1
)
I have read my model is likely wrong in terms of input parameters, what is the correct approach?
When I look at the shape of your data
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
I see, that X is 10-dimensional and y us 1-dimensional
Therefore, you need 10-dimensional input
model.build(input_shape=(None,10))
and 1-dimensional output in the last dense layer
model.add(Dense(1,activation="softmax"))
Target variable yBin/y_train/y_test is 1D array (has a shape (None,1) for a given batch).
Your logits come from the Dense layer and the last Dense layer has 10 neurons with softmax activation. So it will give 10 outputs for each input or (batch_size,10) for each batch. This is represented formally as (None,10).
To resolve the particular shape mismatch issue in question change the neuron count of dense layer to 1 and set activation finction to "sigmoid".
model.add(Dense(1,activation="sigmoid"))
As correctly mentioned by #MSS, You need to use sigmoid activation function with 1 neuron in the last dense layer to match the logits with the labels(1,0) of your dataset which indicates binary class.
Fixed code:
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(1,activation="sigmoid"))
#model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(x=X_train,y=y_train,epochs=10,validation_data=(X_test, y_test),verbose=1)
Output:
Epoch 1/10
446/446 [==============================] - 3s 4ms/step - loss: 0.5400 - accuracy: 0.7449 - val_loss: 0.4769 - val_accuracy: 0.7800
Epoch 2/10
446/446 [==============================] - 2s 4ms/step - loss: 0.4425 - accuracy: 0.7987 - val_loss: 0.4241 - val_accuracy: 0.8095
Epoch 3/10
446/446 [==============================] - 2s 3ms/step - loss: 0.4082 - accuracy: 0.8175 - val_loss: 0.4034 - val_accuracy: 0.8242
Epoch 4/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3934 - accuracy: 0.8286 - val_loss: 0.3927 - val_accuracy: 0.8313
Epoch 5/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3854 - accuracy: 0.8347 - val_loss: 0.3866 - val_accuracy: 0.8320
Epoch 6/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3800 - accuracy: 0.8397 - val_loss: 0.3827 - val_accuracy: 0.8364
Epoch 7/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3762 - accuracy: 0.8411 - val_loss: 0.3786 - val_accuracy: 0.8387
Epoch 8/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3726 - accuracy: 0.8432 - val_loss: 0.3764 - val_accuracy: 0.8404
Epoch 9/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3695 - accuracy: 0.8466 - val_loss: 0.3724 - val_accuracy: 0.8408
Epoch 10/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3665 - accuracy: 0.8478 - val_loss: 0.3698 - val_accuracy: 0.8454
<keras.callbacks.History at 0x7f68ca30f670>
I'm having trouble getting my model to converge. Based on a paper I found that uses SVM as the top of the ResNet, but it's just not working. The RandomFourierTransform I read can be used as a quasi-substitute for SVM in keras
# Instantiate ResNet 50 architecture
with strategy.scope():
t = tf.keras.Input(shape=(256,256,3))
basemodel = ResNet50(
include_top=False,
input_tensor=t,
weights='imagenet'
)
# Create ResNET 50 (RGB Channel)
# Pretrained on ImageNet
# Input: RGB Image ==> Output: 2048 element vector
with strategy.scope():
rgb_model = basemodel.output
rgb_model = AveragePooling2D(pool_size=(7,7))(rgb_model)
rgb_model = Flatten()(rgb_model)
rgb_model = Dense(1000)(rgb_model)
rgb_model = RandomFourierFeatures(output_dim=2048, scale=5.0, kernel_initializer="gaussian", trainable=True)(rgb_model)
rgb_model = Dense(len(classes), activation="linear")(rgb_model)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model = tf.keras.Model(inputs=basemodel.input, outputs=rgb_model)
model.compile(optimizer=optimizer,
loss='hinge',
metrics=[tf.keras.metrics.CategoricalAccuracy(name="acc")])
history = model.fit(train_dataset,
epochs=epochs,
steps_per_epoch=steps_per_epoch,
validation_data=val_dataset,
validation_steps=validation_steps)
This is the output I receive
Epoch 1/50
2/234 [..............................] - ETA: 44:41 - loss: 1.4583 - acc: 0.0781 WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0051s vs `on_train_batch_end` time: 0.0790s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0051s vs `on_train_batch_end` time: 0.0790s). Check your callbacks.
234/234 [==============================] - ETA: 0s - loss: 1.3060 - acc: 0.0452WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0045s vs `on_test_batch_end` time: 0.0343s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0045s vs `on_test_batch_end` time: 0.0343s). Check your callbacks.
234/234 [==============================] - 75s 320ms/step - loss: 1.3060 - acc: 0.0452 - val_loss: 1.1811 - val_acc: 0.0365
Epoch 2/50
234/234 [==============================] - 21s 91ms/step - loss: 1.1190 - acc: 0.0527 - val_loss: 1.0879 - val_acc: 0.0469
Epoch 3/50
234/234 [==============================] - 21s 92ms/step - loss: 1.0570 - acc: 0.0513 - val_loss: 1.0394 - val_acc: 0.0521
Epoch 4/50
234/234 [==============================] - 21s 91ms/step - loss: 1.0192 - acc: 0.0536 - val_loss: 1.0011 - val_acc: 0.0938
Epoch 5/50
234/234 [==============================] - 21s 91ms/step - loss: 1.0005 - acc: 0.0612 - val_loss: 1.0003 - val_acc: 0.0729
Epoch 6/50
234/234 [==============================] - 21s 92ms/step - loss: 1.0003 - acc: 0.0612 - val_loss: 1.0002 - val_acc: 0.0521
Epoch 7/50
234/234 [==============================] - 22s 92ms/step - loss: 1.0002 - acc: 0.0646 - val_loss: 1.0001 - val_acc: 0.0573
I am using Keras with TensorFlow backend to train an LSTM network for some time-sequential data sets. The performance seems pretty good when I represent my training data (as well as the validation data) in the Numpy array format:
train_x.shape: (128346, 10, 34)
val_x.shape: (7941, 10, 34)
test_x.shape: (24181, 10, 34)
train_y.shape: (128346, 2)
val_y.shape: (7941, 2)
test_y.shape: (24181, 2)
P.s., 10 is the time steps and 34 is the number of features; The labels were one-hot encoded.
model = tf.keras.Sequential()
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True,
input_shape=(_TIME_STEPS, _FEATURE_DIMENTIONS)))
model.add(layers.Dropout(0.4))
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True))
model.add(layers.Dropout(0.3))
model.add(layers.TimeDistributed(layers.Dense(_NUM_CLASSES)))
model.add(layers.Flatten())
model.add(layers.Dense(_NUM_CLASSES, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr = _LR)
model.compile(optimizer = opt, loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
And the training results are:
Train on 128346 samples, validate on 7941 samples
Epoch 1/10
128346/128346 [==============================] - 50s 390us/step - loss: 0.5883 - acc: 0.6975 - val_loss: 0.5242 - val_acc: 0.7416
Epoch 2/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4804 - acc: 0.7687 - val_loss: 0.4265 - val_acc: 0.8014
Epoch 3/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4232 - acc: 0.8076 - val_loss: 0.4095 - val_acc: 0.8096
Epoch 4/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3894 - acc: 0.8276 - val_loss: 0.3529 - val_acc: 0.8469
Epoch 5/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3610 - acc: 0.8430 - val_loss: 0.3283 - val_acc: 0.8593
Epoch 6/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3402 - acc: 0.8525 - val_loss: 0.3334 - val_acc: 0.8558
Epoch 7/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3233 - acc: 0.8604 - val_loss: 0.2944 - val_acc: 0.8741
Epoch 8/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3087 - acc: 0.8663 - val_loss: 0.2786 - val_acc: 0.8805
Epoch 9/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2969 - acc: 0.8709 - val_loss: 0.2785 - val_acc: 0.8777
Epoch 10/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2867 - acc: 0.8757 - val_loss: 0.2590 - val_acc: 0.8877
This log seems pretty normal, but when I tried to use TensorFlow Dataset API to represent my data sets, the training process performed very strange (it seems that the model turns to overfit/underfit?):
def tfdata_generator(features, labels, is_training = False, batch_size = _BATCH_SIZE, epoch = _EPOCH):
dataset = tf.data.Dataset.from_tensor_slices((features, tf.cast(labels, dtype = tf.uint8)))
if is_training:
dataset = dataset.shuffle(10000) # depends on sample size
dataset = dataset.batch(batch_size, drop_remainder = True).repeat(epoch).prefetch(batch_size)
return dataset
training_set = tfdata_generator(train_x, train_y, is_training=True)
validation_set = tfdata_generator(val_x, val_y, is_training=False)
testing_set = tfdata_generator(test_x, test_y, is_training=False)
Training on the same model and hyperparameters:
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
And the log seems much different from the previous one:
Epoch 1/10
2005/2005 [==============================] - 54s 27ms/step - loss: 0.1451 - acc: 0.9419 - val_loss: 3.2980 - val_acc: 0.4975
Epoch 2/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1675 - acc: 0.9371 - val_loss: 3.0838 - val_acc: 0.4975
Epoch 3/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1821 - acc: 0.9316 - val_loss: 3.1212 - val_acc: 0.4975
Epoch 4/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1902 - acc: 0.9287 - val_loss: 3.0032 - val_acc: 0.4975
Epoch 5/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1905 - acc: 0.9283 - val_loss: 2.9671 - val_acc: 0.4975
Epoch 6/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1867 - acc: 0.9299 - val_loss: 2.8734 - val_acc: 0.4975
Epoch 7/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1802 - acc: 0.9316 - val_loss: 2.8651 - val_acc: 0.4975
Epoch 8/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1740 - acc: 0.9350 - val_loss: 2.8793 - val_acc: 0.4975
Epoch 9/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1660 - acc: 0.9388 - val_loss: 2.7894 - val_acc: 0.4975
Epoch 10/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1613 - acc: 0.9405 - val_loss: 2.7997 - val_acc: 0.4975
The validation loss could not be reduced and the val_acc always the same value when I use the TensorFlow Dataset API to represent my data.
My questions are:
Based on the same model and parameters, why the model.fit() provides such different training results when I merely adopted tf.data.Dataset API?
What the difference between these two mechanisms?
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
vs
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
How to solve this strange problem if I have to use tf.data.Dataset API?
When I use Keras to train a model with model.fit(), I see a progress bar that looks like this:
Epoch 1/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10
8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
To improve readability, I would like to have the epoch number on the same line as the progress bar, like this:
Epoch 1/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10: 8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
How can I make that change? I know that Keras has callbacks that can be invoked during training, but I am not familiar with how that works.
If you want to use an alternative, you could use tqdm (version >= 4.41.0):
from tqdm.keras import TqdmCallback
...
model.fit(..., verbose=0, callbacks=[TqdmCallback(verbose=2)])
This turns off keras' progress (verbose=0), and uses tqdm instead. For the callback, verbose=2 means separate progressbars for epochs and batches. 1 means clear batch bars when done. 0 means only show epochs (never show batch bars).
Yes, you can use callbacks (https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback). For example:
import tensorflow as tf
class PrintLogs(tf.keras.callbacks.Callback):
def __init__(self, epochs):
self.epochs = epochs
def set_params(self, params):
params['epochs'] = 0
def on_epoch_begin(self, epoch, logs=None):
print('Epoch %d/%d' % (epoch + 1, self.epochs), end='')
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
epochs = 5
model.fit(x_train, y_train,
epochs=epochs,
validation_split=0.2,
verbose = 2,
callbacks=[PrintLogs(epochs)])
output:
Train on 48000 samples, validate on 12000 samples
Epoch 1/5 - 10s - loss: 0.0306 - acc: 0.9901 - val_loss: 0.0837 - val_acc: 0.9786
Epoch 2/5 - 9s - loss: 0.0269 - acc: 0.9910 - val_loss: 0.0839 - val_acc: 0.9788
Epoch 3/5 - 9s - loss: 0.0253 - acc: 0.9915 - val_loss: 0.0895 - val_acc: 0.9781
Epoch 4/5 - 9s - loss: 0.0201 - acc: 0.9930 - val_loss: 0.0871 - val_acc: 0.9792
Epoch 5/5 - 9s - loss: 0.0206 - acc: 0.9931 - val_loss: 0.0917 - val_acc: 0.9793
I am trying to train my model by finetuning a pretrained model(vggface). My model has 12 classes with 1774 training images and 313 validation images, each class having around 150 images.
My model was overfitting so I added dropout and FC layers with batch normalization to see how it goes. But still, the model overfits:
train_data_path = 'dataset_cfps/train'
validation_data_path = 'dataset_cfps/validation'
#Parametres
img_width, img_height = 224, 224
vggface = VGGFace(model='resnet50', include_top=False, input_shape=(img_width, img_height, 3))
last_layer = vggface.get_layer('avg_pool').output
x = Flatten(name='flatten')(last_layer)
xx = Dense(1024, activation = 'softmax')(x)
x2 = Dropout(0.5)(xx)
y = Dense(1024, activation = 'softmax')(x2)
yy = BatchNormalization()(y)
y1 = Dropout(0.5)(yy)
x3 = Dense(12, activation='softmax', name='classifier')(y1)
custom_vgg_model = Model(vggface.input, x3)
# Create the model
model = models.Sequential()
# Add the convolutional base model
model.add(custom_vgg_model)
model.summary()
model = load_model('facenet_resnet_lr3_SGD_relu_1024.h5')
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)
# Change the batchsize according to your system RAM
train_batchsize = 32
val_batchsize = 32
train_generator = train_datagen.flow_from_directory(
train_data_path,
target_size=(img_width, img_height),
batch_size=train_batchsize,
class_mode='categorical')
validation_generator = validation_datagen.flow_from_directory(
validation_data_path,
target_size=(img_width, img_height),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=True)
# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-3),
metrics=['acc'])
# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size ,
epochs=100,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)
# Save the model
model.save('facenet_resnet_lr3_SGD_relu_1024_1.h5')
Here are the epochs:
(type) Output Shape Param #
=================================================================
model_5 (Model) (None, 12) 26725324
=================================================================
Total params: 26,725,324
Trainable params: 26,670,156
Non-trainable params: 55,168
_________________________________________________________________
Found 1774 images belonging to 12 classes.
Found 313 images belonging to 12 classes.
.
.
.
Epoch 70/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5433 - acc: 0.8987 - val_loss: 0.8271 - val_acc: 0.7796
Epoch 71/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.9145 - val_loss: 0.7954 - val_acc: 0.7508
Epoch 72/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.8955 - val_loss: 0.8690 - val_acc: 0.7348
Epoch 73/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5310 - acc: 0.9037 - val_loss: 0.8673 - val_acc: 0.7476
Epoch 74/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5189 - acc: 0.8943 - val_loss: 0.8701 - val_acc: 0.7380
Epoch 75/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5333 - acc: 0.8952 - val_loss: 0.9399 - val_acc: 0.7188
Epoch 76/100
56/55 [==============================] - 49s 879ms/step - loss: 0.5106 - acc: 0.9043 - val_loss: 0.8107 - val_acc: 0.7700
Epoch 77/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5108 - acc: 0.9064 - val_loss: 0.9624 - val_acc: 0.6869
Epoch 78/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5214 - acc: 0.8994 - val_loss: 0.9602 - val_acc: 0.6933
Epoch 79/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5246 - acc: 0.9009 - val_loss: 0.8379 - val_acc: 0.7572
Epoch 80/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4859 - acc: 0.9082 - val_loss: 0.7856 - val_acc: 0.7796
Epoch 81/100
56/55 [==============================] - 49s 881ms/step - loss: 0.5005 - acc: 0.9175 - val_loss: 0.7609 - val_acc: 0.7827
Epoch 82/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4690 - acc: 0.9294 - val_loss: 0.7671 - val_acc: 0.7636
Epoch 83/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4897 - acc: 0.9146 - val_loss: 0.7902 - val_acc: 0.7636
Epoch 84/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4604 - acc: 0.9291 - val_loss: 0.7603 - val_acc: 0.7636
Epoch 85/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4750 - acc: 0.9220 - val_loss: 0.7325 - val_acc: 0.7668
Epoch 86/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4524 - acc: 0.9266 - val_loss: 0.7782 - val_acc: 0.7636
Epoch 87/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4643 - acc: 0.9172 - val_loss: 0.9892 - val_acc: 0.6901
Epoch 88/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4718 - acc: 0.9177 - val_loss: 0.8269 - val_acc: 0.7380
Epoch 89/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4646 - acc: 0.9290 - val_loss: 0.7846 - val_acc: 0.7604
Epoch 90/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4433 - acc: 0.9341 - val_loss: 0.7693 - val_acc: 0.7764
Epoch 91/100
56/55 [==============================] - 49s 877ms/step - loss: 0.4706 - acc: 0.9196 - val_loss: 0.8200 - val_acc: 0.7604
Epoch 92/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4572 - acc: 0.9184 - val_loss: 0.9220 - val_acc: 0.7220
Epoch 93/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4479 - acc: 0.9175 - val_loss: 0.8781 - val_acc: 0.7348
Epoch 94/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4793 - acc: 0.9100 - val_loss: 0.8035 - val_acc: 0.7572
Epoch 95/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4329 - acc: 0.9279 - val_loss: 0.7750 - val_acc: 0.7796
Epoch 96/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4361 - acc: 0.9212 - val_loss: 0.8124 - val_acc: 0.7508
Epoch 97/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4371 - acc: 0.9202 - val_loss: 0.9806 - val_acc: 0.7029
Epoch 98/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4298 - acc: 0.9149 - val_loss: 0.8637 - val_acc: 0.7380
Epoch 99/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4370 - acc: 0.9255 - val_loss: 0.8349 - val_acc: 0.7604
Epoch 100/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4407 - acc: 0.9205 - val_loss: 0.8477 - val_acc: 0.7508
CNN deep networks need a huge data for training. You have a little dataset and the model is unable to generalize from this small dataset. You have two options
reduce the network size
increase the number of dataset
EDIT after comments on answer:
The model has some issues. You wouldn't use softmax for hidden layers.
If you want to overcome the over-fitting issue you would freeze the trained layers and train only new added layers. If the model still overfits, you may remove some of layers you have added or lower their number of units.