I'm having trouble getting my model to converge. Based on a paper I found that uses SVM as the top of the ResNet, but it's just not working. The RandomFourierTransform I read can be used as a quasi-substitute for SVM in keras
# Instantiate ResNet 50 architecture
with strategy.scope():
t = tf.keras.Input(shape=(256,256,3))
basemodel = ResNet50(
include_top=False,
input_tensor=t,
weights='imagenet'
)
# Create ResNET 50 (RGB Channel)
# Pretrained on ImageNet
# Input: RGB Image ==> Output: 2048 element vector
with strategy.scope():
rgb_model = basemodel.output
rgb_model = AveragePooling2D(pool_size=(7,7))(rgb_model)
rgb_model = Flatten()(rgb_model)
rgb_model = Dense(1000)(rgb_model)
rgb_model = RandomFourierFeatures(output_dim=2048, scale=5.0, kernel_initializer="gaussian", trainable=True)(rgb_model)
rgb_model = Dense(len(classes), activation="linear")(rgb_model)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model = tf.keras.Model(inputs=basemodel.input, outputs=rgb_model)
model.compile(optimizer=optimizer,
loss='hinge',
metrics=[tf.keras.metrics.CategoricalAccuracy(name="acc")])
history = model.fit(train_dataset,
epochs=epochs,
steps_per_epoch=steps_per_epoch,
validation_data=val_dataset,
validation_steps=validation_steps)
This is the output I receive
Epoch 1/50
2/234 [..............................] - ETA: 44:41 - loss: 1.4583 - acc: 0.0781 WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0051s vs `on_train_batch_end` time: 0.0790s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0051s vs `on_train_batch_end` time: 0.0790s). Check your callbacks.
234/234 [==============================] - ETA: 0s - loss: 1.3060 - acc: 0.0452WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0045s vs `on_test_batch_end` time: 0.0343s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0045s vs `on_test_batch_end` time: 0.0343s). Check your callbacks.
234/234 [==============================] - 75s 320ms/step - loss: 1.3060 - acc: 0.0452 - val_loss: 1.1811 - val_acc: 0.0365
Epoch 2/50
234/234 [==============================] - 21s 91ms/step - loss: 1.1190 - acc: 0.0527 - val_loss: 1.0879 - val_acc: 0.0469
Epoch 3/50
234/234 [==============================] - 21s 92ms/step - loss: 1.0570 - acc: 0.0513 - val_loss: 1.0394 - val_acc: 0.0521
Epoch 4/50
234/234 [==============================] - 21s 91ms/step - loss: 1.0192 - acc: 0.0536 - val_loss: 1.0011 - val_acc: 0.0938
Epoch 5/50
234/234 [==============================] - 21s 91ms/step - loss: 1.0005 - acc: 0.0612 - val_loss: 1.0003 - val_acc: 0.0729
Epoch 6/50
234/234 [==============================] - 21s 92ms/step - loss: 1.0003 - acc: 0.0612 - val_loss: 1.0002 - val_acc: 0.0521
Epoch 7/50
234/234 [==============================] - 22s 92ms/step - loss: 1.0002 - acc: 0.0646 - val_loss: 1.0001 - val_acc: 0.0573
Related
First of all, I know that there is a similar thread here:
https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn
But unfortunately, it does not help. I probably have a bug inside my code which I cannot find. What I am trying to do is to classify some WAV files. But the model does not learn.
At first, I am collecting the files and saving them in an array.
Second, create new directories, one for train data and one for val data.
Next, I am reading the WAV files, creating spectrograms, and saving them all to the train directory.
Afterward, I am moving 20% of the data from the train directory to the val directory.
Note: While creating the spectrograms I am checking the length of the WAV. If it is too short (less than 2 sec), I am doubling it. Out of this spectrogram, I am cutting a random chunk and saving only this. As a result, all images do have the same height and width.
Then as the next step, I am loading the train and val images. And here I am also doing the normalization.
IMG_WIDTH=300
IMG_HEIGHT=300
IMG_DIM = (IMG_WIDTH, IMG_HEIGHT, 3)
train_files = glob.glob(DBMEL_PATH + "*",recursive=True)
train_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in train_files]
train_imgs = np.array(train_imgs) / 255 # normalizing Data
train_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in train_files]
validation_files = glob.glob(DBMEL_VAL_PATH + "*",recursive=True)
validation_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in validation_files]
validation_imgs = np.array(validation_imgs) / 255 # normalizing Data
validation_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in validation_files]
I have checked the variables and printing them. I guess this is working quite well. The arrays contain 80% and respectively 20% of the total data.
#Train dataset shape: (3756, 300, 300, 3)
#Validation dataset shape: (939, 300, 300, 3)
Next, I have also implemented a One-Hot-Encoder.
So far so good. In the next step I create empty DataGenerators, so without any data augmentation. When calling the DataGenerators, one time for train-data and one time for val-data, I'll pass the arrays for images (train_imgs, validation_imgs) and the one-hot-encoded-labels (train_labels_enc, validation_labels_enc).
Okay. Here now comes the tricky part.
First, create/load a pre-trained network
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Model
import tensorflow.keras
input_shape=(IMG_HEIGHT,IMG_WIDTH,3)
restnet = ResNet50(include_top=False, weights='imagenet', input_shape=(IMG_HEIGHT,IMG_WIDTH,3))
output = restnet.layers[-1].output
output = tensorflow.keras.layers.Flatten()(output)
restnet = Model(restnet.input, output)
for layer in restnet.layers:
layer.trainable = False
And now finally creating the model itself. While creating the model I am using the pre-trained network for transfer learning. I guess somewhere there must be a problem.
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, InputLayer
from tensorflow.keras.models import Sequential
from tensorflow.keras import optimizers
model = Sequential()
model.add(restnet) # <-- transfer learning
model.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(7, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
And the models run with this
history = model.fit_generator(train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=val_generator,
validation_steps=10,
verbose=1
)
But even after 50 epochs the accuracy stalls at around 0.15
Epoch 1/100
100/100 [==============================] - 711s 7s/step - loss: 10.6419 - accuracy: 0.1530 - val_loss: 1.9416 - val_accuracy: 0.1467
Epoch 2/100
100/100 [==============================] - 733s 7s/step - loss: 1.9595 - accuracy: 0.1550 - val_loss: 1.9372 - val_accuracy: 0.1267
Epoch 3/100
100/100 [==============================] - 731s 7s/step - loss: 1.9940 - accuracy: 0.1444 - val_loss: 1.9388 - val_accuracy: 0.1400
Epoch 4/100
100/100 [==============================] - 735s 7s/step - loss: 1.9416 - accuracy: 0.1535 - val_loss: 1.9380 - val_accuracy: 0.1733
Epoch 5/100
100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1656 - val_loss: 1.9345 - val_accuracy: 0.1533
Epoch 6/100
100/100 [==============================] - 741s 7s/step - loss: 1.9364 - accuracy: 0.1667 - val_loss: 1.9286 - val_accuracy: 0.1767
Epoch 7/100
100/100 [==============================] - 740s 7s/step - loss: 1.9389 - accuracy: 0.1523 - val_loss: 1.9305 - val_accuracy: 0.1400
Epoch 8/100
100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1623 - val_loss: 1.9441 - val_accuracy: 0.1667
Epoch 9/100
100/100 [==============================] - 735s 7s/step - loss: 1.9391 - accuracy: 0.1582 - val_loss: 1.9458 - val_accuracy: 0.1333
Epoch 10/100
100/100 [==============================] - 734s 7s/step - loss: 1.9381 - accuracy: 0.1602 - val_loss: 1.9372 - val_accuracy: 0.1700
Epoch 11/100
100/100 [==============================] - 739s 7s/step - loss: 1.9392 - accuracy: 0.1623 - val_loss: 1.9302 - val_accuracy: 0.2167
Epoch 12/100
100/100 [==============================] - 741s 7s/step - loss: 1.9368 - accuracy: 0.1627 - val_loss: 1.9326 - val_accuracy: 0.1467
Epoch 13/100
100/100 [==============================] - 740s 7s/step - loss: 1.9381 - accuracy: 0.1513 - val_loss: 1.9312 - val_accuracy: 0.1733
Epoch 14/100
100/100 [==============================] - 736s 7s/step - loss: 1.9396 - accuracy: 0.1542 - val_loss: 1.9407 - val_accuracy: 0.1367
Epoch 15/100
100/100 [==============================] - 741s 7s/step - loss: 1.9393 - accuracy: 0.1597 - val_loss: 1.9336 - val_accuracy: 0.1333
Epoch 16/100
100/100 [==============================] - 737s 7s/step - loss: 1.9375 - accuracy: 0.1659 - val_loss: 1.9354 - val_accuracy: 0.1267
Epoch 17/100
100/100 [==============================] - 741s 7s/step - loss: 1.9422 - accuracy: 0.1487 - val_loss: 1.9307 - val_accuracy: 0.1567
Epoch 18/100
100/100 [==============================] - 738s 7s/step - loss: 1.9399 - accuracy: 0.1680 - val_loss: 1.9408 - val_accuracy: 0.1567
Epoch 19/100
100/100 [==============================] - 743s 7s/step - loss: 1.9405 - accuracy: 0.1610 - val_loss: 1.9335 - val_accuracy: 0.1533
Epoch 20/100
100/100 [==============================] - 738s 7s/step - loss: 1.9410 - accuracy: 0.1575 - val_loss: 1.9331 - val_accuracy: 0.1533
Epoch 21/100
100/100 [==============================] - 746s 7s/step - loss: 1.9395 - accuracy: 0.1639 - val_loss: 1.9344 - val_accuracy: 0.1733
Epoch 22/100
100/100 [==============================] - 746s 7s/step - loss: 1.9393 - accuracy: 0.1585 - val_loss: 1.9354 - val_accuracy: 0.1667
Epoch 23/100
100/100 [==============================] - 746s 7s/step - loss: 1.9398 - accuracy: 0.1599 - val_loss: 1.9352 - val_accuracy: 0.1500
Epoch 24/100
100/100 [==============================] - 746s 7s/step - loss: 1.9392 - accuracy: 0.1585 - val_loss: 1.9449 - val_accuracy: 0.1667
Epoch 25/100
100/100 [==============================] - 746s 7s/step - loss: 1.9399 - accuracy: 0.1495 - val_loss: 1.9352 - val_accuracy: 0.1600
Can anyone please help to find the problem?
I solved the problem on my own.
I exchanged this
model = Sequential()
model.add(restnet) # <-- transfer learning
model.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(7, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
with this:
base_model = tf.keras.applications.MobileNetV2(input_shape = (224, 224, 3), include_top = False, weights = "imagenet")
model = Sequential()
model.add(base_model)
model.add(tf.keras.layers.GlobalAveragePooling2D())
model.add(Dropout(0.2))
model.add(Dense(number_classes, activation="softmax"))
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001),
loss="categorical_crossentropy",
metrics=['accuracy'])
model.summary()
And I found out one more thing. In contrary to some tutorials, using data augmentation is not useful when working with spectrograms.
Without data augmentation I got 0.99 on train-accuracy and 0.72 on val-accuracy. But with data augmentation I got only 0.75 on train-accuracy and 0.16 on val-accuracy.
I'm a newbie with deep learning and I try to create a model and I don't really understand the model. add(layers). I m sure that the input shape (it's for recognition). I think the problem is in the Dropout, but I don't understand the value.
Can someone explains to me the
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation = 'relu', input_shape = (128,128,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=1e-4), metrics=['acc'])
-------------------------------------------------------
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
)
and here is the result :
Epoch 15/30
5/5 [==============================] - 0s 34ms/step - loss: 0.3987 - acc: 0.8536 - val_loss: 0.7021 - val_acc: 0.7143
Epoch 16/30
5/5 [==============================] - 0s 31ms/step - loss: 0.3223 - acc: 0.8891 - val_loss: 0.6393 - val_acc: 0.7778
Epoch 17/30
5/5 [==============================] - 0s 32ms/step - loss: 0.3321 - acc: 0.9082 - val_loss: 0.6229 - val_acc: 0.7460
Epoch 18/30
5/5 [==============================] - 0s 31ms/step - loss: 0.2615 - acc: 0.9409 - val_loss: 0.6591 - val_acc: 0.8095
Epoch 19/30
5/5 [==============================] - 0s 32ms/step - loss: 0.2161 - acc: 0.9857 - val_loss: 0.6368 - val_acc: 0.7143
Epoch 20/30
5/5 [==============================] - 0s 33ms/step - loss: 0.1773 - acc: 0.9857 - val_loss: 0.5644 - val_acc: 0.7778
Epoch 21/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1650 - acc: 0.9782 - val_loss: 0.5459 - val_acc: 0.8413
Epoch 22/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1534 - acc: 0.9789 - val_loss: 0.5738 - val_acc: 0.7460
Epoch 23/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1205 - acc: 0.9921 - val_loss: 0.5351 - val_acc: 0.8095
Epoch 24/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0967 - acc: 1.0000 - val_loss: 0.5256 - val_acc: 0.8413
Epoch 25/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0736 - acc: 1.0000 - val_loss: 0.5493 - val_acc: 0.7937
Epoch 26/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0826 - acc: 1.0000 - val_loss: 0.5342 - val_acc: 0.8254
Epoch 27/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0687 - acc: 1.0000 - val_loss: 0.5452 - val_acc: 0.8254
Epoch 28/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0571 - acc: 1.0000 - val_loss: 0.5176 - val_acc: 0.7937
Epoch 29/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0549 - acc: 1.0000 - val_loss: 0.5142 - val_acc: 0.8095
Epoch 30/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0479 - acc: 1.0000 - val_loss: 0.5243 - val_acc: 0.8095
I never depassed the 70% average but on this i have 80% but i think i'm on overfitting.. I evidemently searched on differents docs but i'm lost
Have you try following into your training:
Data Augmentation
Pre-trained Model
Looking at the execution time per epoch, it looks like your data set is pretty small. Also, it's not clear whether there is any class imbalance in your dataset. You probably should try stratified CV training and analysis on the folds results. It won't prevent overfit but it will eventually give you more insight into your model, which generally can help to reduce overfitting. However, preventing overfitting is a general topic, search online to get resources. You can also try this
model.compile(loss='categorical_crossentropy',
optimizer='adam, metrics=['acc'])
-------------------------------------------------------
# src: https://keras.io/api/callbacks/reduce_lr_on_plateau/
# reduce learning rate by a factor of 0.2 if val_loss -
# won't improve within 5 epoch.
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.00001)
# src: https://keras.io/api/callbacks/early_stopping/
# stop training if val_loss don't improve within 15 epoch.
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
callbacks=[reduce_lr, early_stop]
)
You may also find it useful of using ModelCheckpoint or LearningRateScheduler. This doesn't guarantee of no overfit but some approach for that to adopt.
I'm new to Keras and I'm using it to build a normal Neural Network to classify number MNIST dataset.
Beforehand I have already split the data into 3 parts: 55000 to train, 5000 to evaluate and 10000 to test, and I have scaled the pixel density down (by dividing it by 255.0)
My model looks like this:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28]))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
And here is the compile:
model.compile(loss='sparse_categorical_crossentropy',
optimizer = 'Adam',
metrics=['accuracy'])
I train the model:
his = model.fit(xTrain, yTrain, epochs = 20, validation_data=(xValid, yValid))
At first the val_loss decreases, then it increases although the accuracy is increasing.
Train on 55000 samples, validate on 5000 samples
Epoch 1/20
55000/55000 [==============================] - 5s 91us/sample - loss: 0.2822 - accuracy: 0.9199 - val_loss: 0.1471 - val_accuracy: 0.9588
Epoch 2/20
55000/55000 [==============================] - 5s 82us/sample - loss: 0.1274 - accuracy: 0.9626 - val_loss: 0.1011 - val_accuracy: 0.9710
Epoch 3/20
55000/55000 [==============================] - 5s 83us/sample - loss: 0.0899 - accuracy: 0.9734 - val_loss: 0.0939 - val_accuracy: 0.9742
Epoch 4/20
55000/55000 [==============================] - 5s 84us/sample - loss: 0.0674 - accuracy: 0.9796 - val_loss: 0.0760 - val_accuracy: 0.9770
Epoch 5/20
55000/55000 [==============================] - 5s 94us/sample - loss: 0.0541 - accuracy: 0.9836 - val_loss: 0.0842 - val_accuracy: 0.9742
Epoch 15/20
55000/55000 [==============================] - 4s 82us/sample - loss: 0.0103 - accuracy: 0.9967 - val_loss: 0.0963 - val_accuracy: 0.9788
Epoch 16/20
55000/55000 [==============================] - 5s 84us/sample - loss: 0.0092 - accuracy: 0.9973 - val_loss: 0.0956 - val_accuracy: 0.9774
Epoch 17/20
55000/55000 [==============================] - 5s 82us/sample - loss: 0.0081 - accuracy: 0.9977 - val_loss: 0.0977 - val_accuracy: 0.9770
Epoch 18/20
55000/55000 [==============================] - 5s 85us/sample - loss: 0.0076 - accuracy: 0.9977 - val_loss: 0.1057 - val_accuracy: 0.9760
Epoch 19/20
55000/55000 [==============================] - 5s 83us/sample - loss: 0.0063 - accuracy: 0.9980 - val_loss: 0.1108 - val_accuracy: 0.9774
Epoch 20/20
55000/55000 [==============================] - 5s 85us/sample - loss: 0.0066 - accuracy: 0.9980 - val_loss: 0.1056 - val_accuracy: 0.9768
And when I evaluate the loss is too high:
model.evaluate(xTest, yTest)
Result:
10000/10000 [==============================] - 0s 41us/sample - loss: 25.7150 - accuracy: 0.9740
[25.714989705941953, 0.974]
Is this ok, or is it a sign of overfitting? Should I do something to improve it? Thanks in advance.
Usually, it is not Ok. You want the loss rate to be as small as possible. Your result is typical for overfitting. Your Network 'knows' its training data, but isn't capable of analysing new Images. You may want to add some layers. Maybe Convolutional Layers, Dropout Layer... another idea would be to augment your training images. The ImageDataGenerator-Class provided by Keras might help you out here
Another thing to look at could be your hyperparameters. Why do you use 100 nodes in the first dense layer? maybe something like 784 (28*28) seems more interesting if you want to start with a dense layer. I would suggest some combination of Convolutional-Dropout-Dense. Then your dense -layer maybe doesn't need that many nodes...
I am using Keras with TensorFlow backend to train an LSTM network for some time-sequential data sets. The performance seems pretty good when I represent my training data (as well as the validation data) in the Numpy array format:
train_x.shape: (128346, 10, 34)
val_x.shape: (7941, 10, 34)
test_x.shape: (24181, 10, 34)
train_y.shape: (128346, 2)
val_y.shape: (7941, 2)
test_y.shape: (24181, 2)
P.s., 10 is the time steps and 34 is the number of features; The labels were one-hot encoded.
model = tf.keras.Sequential()
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True,
input_shape=(_TIME_STEPS, _FEATURE_DIMENTIONS)))
model.add(layers.Dropout(0.4))
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True))
model.add(layers.Dropout(0.3))
model.add(layers.TimeDistributed(layers.Dense(_NUM_CLASSES)))
model.add(layers.Flatten())
model.add(layers.Dense(_NUM_CLASSES, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr = _LR)
model.compile(optimizer = opt, loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
And the training results are:
Train on 128346 samples, validate on 7941 samples
Epoch 1/10
128346/128346 [==============================] - 50s 390us/step - loss: 0.5883 - acc: 0.6975 - val_loss: 0.5242 - val_acc: 0.7416
Epoch 2/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4804 - acc: 0.7687 - val_loss: 0.4265 - val_acc: 0.8014
Epoch 3/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4232 - acc: 0.8076 - val_loss: 0.4095 - val_acc: 0.8096
Epoch 4/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3894 - acc: 0.8276 - val_loss: 0.3529 - val_acc: 0.8469
Epoch 5/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3610 - acc: 0.8430 - val_loss: 0.3283 - val_acc: 0.8593
Epoch 6/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3402 - acc: 0.8525 - val_loss: 0.3334 - val_acc: 0.8558
Epoch 7/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3233 - acc: 0.8604 - val_loss: 0.2944 - val_acc: 0.8741
Epoch 8/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3087 - acc: 0.8663 - val_loss: 0.2786 - val_acc: 0.8805
Epoch 9/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2969 - acc: 0.8709 - val_loss: 0.2785 - val_acc: 0.8777
Epoch 10/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2867 - acc: 0.8757 - val_loss: 0.2590 - val_acc: 0.8877
This log seems pretty normal, but when I tried to use TensorFlow Dataset API to represent my data sets, the training process performed very strange (it seems that the model turns to overfit/underfit?):
def tfdata_generator(features, labels, is_training = False, batch_size = _BATCH_SIZE, epoch = _EPOCH):
dataset = tf.data.Dataset.from_tensor_slices((features, tf.cast(labels, dtype = tf.uint8)))
if is_training:
dataset = dataset.shuffle(10000) # depends on sample size
dataset = dataset.batch(batch_size, drop_remainder = True).repeat(epoch).prefetch(batch_size)
return dataset
training_set = tfdata_generator(train_x, train_y, is_training=True)
validation_set = tfdata_generator(val_x, val_y, is_training=False)
testing_set = tfdata_generator(test_x, test_y, is_training=False)
Training on the same model and hyperparameters:
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
And the log seems much different from the previous one:
Epoch 1/10
2005/2005 [==============================] - 54s 27ms/step - loss: 0.1451 - acc: 0.9419 - val_loss: 3.2980 - val_acc: 0.4975
Epoch 2/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1675 - acc: 0.9371 - val_loss: 3.0838 - val_acc: 0.4975
Epoch 3/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1821 - acc: 0.9316 - val_loss: 3.1212 - val_acc: 0.4975
Epoch 4/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1902 - acc: 0.9287 - val_loss: 3.0032 - val_acc: 0.4975
Epoch 5/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1905 - acc: 0.9283 - val_loss: 2.9671 - val_acc: 0.4975
Epoch 6/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1867 - acc: 0.9299 - val_loss: 2.8734 - val_acc: 0.4975
Epoch 7/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1802 - acc: 0.9316 - val_loss: 2.8651 - val_acc: 0.4975
Epoch 8/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1740 - acc: 0.9350 - val_loss: 2.8793 - val_acc: 0.4975
Epoch 9/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1660 - acc: 0.9388 - val_loss: 2.7894 - val_acc: 0.4975
Epoch 10/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1613 - acc: 0.9405 - val_loss: 2.7997 - val_acc: 0.4975
The validation loss could not be reduced and the val_acc always the same value when I use the TensorFlow Dataset API to represent my data.
My questions are:
Based on the same model and parameters, why the model.fit() provides such different training results when I merely adopted tf.data.Dataset API?
What the difference between these two mechanisms?
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
vs
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
How to solve this strange problem if I have to use tf.data.Dataset API?
I am trying to train my model by finetuning a pretrained model(vggface). My model has 12 classes with 1774 training images and 313 validation images, each class having around 150 images.
My model was overfitting so I added dropout and FC layers with batch normalization to see how it goes. But still, the model overfits:
train_data_path = 'dataset_cfps/train'
validation_data_path = 'dataset_cfps/validation'
#Parametres
img_width, img_height = 224, 224
vggface = VGGFace(model='resnet50', include_top=False, input_shape=(img_width, img_height, 3))
last_layer = vggface.get_layer('avg_pool').output
x = Flatten(name='flatten')(last_layer)
xx = Dense(1024, activation = 'softmax')(x)
x2 = Dropout(0.5)(xx)
y = Dense(1024, activation = 'softmax')(x2)
yy = BatchNormalization()(y)
y1 = Dropout(0.5)(yy)
x3 = Dense(12, activation='softmax', name='classifier')(y1)
custom_vgg_model = Model(vggface.input, x3)
# Create the model
model = models.Sequential()
# Add the convolutional base model
model.add(custom_vgg_model)
model.summary()
model = load_model('facenet_resnet_lr3_SGD_relu_1024.h5')
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)
# Change the batchsize according to your system RAM
train_batchsize = 32
val_batchsize = 32
train_generator = train_datagen.flow_from_directory(
train_data_path,
target_size=(img_width, img_height),
batch_size=train_batchsize,
class_mode='categorical')
validation_generator = validation_datagen.flow_from_directory(
validation_data_path,
target_size=(img_width, img_height),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=True)
# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-3),
metrics=['acc'])
# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size ,
epochs=100,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)
# Save the model
model.save('facenet_resnet_lr3_SGD_relu_1024_1.h5')
Here are the epochs:
(type) Output Shape Param #
=================================================================
model_5 (Model) (None, 12) 26725324
=================================================================
Total params: 26,725,324
Trainable params: 26,670,156
Non-trainable params: 55,168
_________________________________________________________________
Found 1774 images belonging to 12 classes.
Found 313 images belonging to 12 classes.
.
.
.
Epoch 70/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5433 - acc: 0.8987 - val_loss: 0.8271 - val_acc: 0.7796
Epoch 71/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.9145 - val_loss: 0.7954 - val_acc: 0.7508
Epoch 72/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.8955 - val_loss: 0.8690 - val_acc: 0.7348
Epoch 73/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5310 - acc: 0.9037 - val_loss: 0.8673 - val_acc: 0.7476
Epoch 74/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5189 - acc: 0.8943 - val_loss: 0.8701 - val_acc: 0.7380
Epoch 75/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5333 - acc: 0.8952 - val_loss: 0.9399 - val_acc: 0.7188
Epoch 76/100
56/55 [==============================] - 49s 879ms/step - loss: 0.5106 - acc: 0.9043 - val_loss: 0.8107 - val_acc: 0.7700
Epoch 77/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5108 - acc: 0.9064 - val_loss: 0.9624 - val_acc: 0.6869
Epoch 78/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5214 - acc: 0.8994 - val_loss: 0.9602 - val_acc: 0.6933
Epoch 79/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5246 - acc: 0.9009 - val_loss: 0.8379 - val_acc: 0.7572
Epoch 80/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4859 - acc: 0.9082 - val_loss: 0.7856 - val_acc: 0.7796
Epoch 81/100
56/55 [==============================] - 49s 881ms/step - loss: 0.5005 - acc: 0.9175 - val_loss: 0.7609 - val_acc: 0.7827
Epoch 82/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4690 - acc: 0.9294 - val_loss: 0.7671 - val_acc: 0.7636
Epoch 83/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4897 - acc: 0.9146 - val_loss: 0.7902 - val_acc: 0.7636
Epoch 84/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4604 - acc: 0.9291 - val_loss: 0.7603 - val_acc: 0.7636
Epoch 85/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4750 - acc: 0.9220 - val_loss: 0.7325 - val_acc: 0.7668
Epoch 86/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4524 - acc: 0.9266 - val_loss: 0.7782 - val_acc: 0.7636
Epoch 87/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4643 - acc: 0.9172 - val_loss: 0.9892 - val_acc: 0.6901
Epoch 88/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4718 - acc: 0.9177 - val_loss: 0.8269 - val_acc: 0.7380
Epoch 89/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4646 - acc: 0.9290 - val_loss: 0.7846 - val_acc: 0.7604
Epoch 90/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4433 - acc: 0.9341 - val_loss: 0.7693 - val_acc: 0.7764
Epoch 91/100
56/55 [==============================] - 49s 877ms/step - loss: 0.4706 - acc: 0.9196 - val_loss: 0.8200 - val_acc: 0.7604
Epoch 92/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4572 - acc: 0.9184 - val_loss: 0.9220 - val_acc: 0.7220
Epoch 93/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4479 - acc: 0.9175 - val_loss: 0.8781 - val_acc: 0.7348
Epoch 94/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4793 - acc: 0.9100 - val_loss: 0.8035 - val_acc: 0.7572
Epoch 95/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4329 - acc: 0.9279 - val_loss: 0.7750 - val_acc: 0.7796
Epoch 96/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4361 - acc: 0.9212 - val_loss: 0.8124 - val_acc: 0.7508
Epoch 97/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4371 - acc: 0.9202 - val_loss: 0.9806 - val_acc: 0.7029
Epoch 98/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4298 - acc: 0.9149 - val_loss: 0.8637 - val_acc: 0.7380
Epoch 99/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4370 - acc: 0.9255 - val_loss: 0.8349 - val_acc: 0.7604
Epoch 100/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4407 - acc: 0.9205 - val_loss: 0.8477 - val_acc: 0.7508
CNN deep networks need a huge data for training. You have a little dataset and the model is unable to generalize from this small dataset. You have two options
reduce the network size
increase the number of dataset
EDIT after comments on answer:
The model has some issues. You wouldn't use softmax for hidden layers.
If you want to overcome the over-fitting issue you would freeze the trained layers and train only new added layers. If the model still overfits, you may remove some of layers you have added or lower their number of units.