CNN in keras overfitting even after dropout and batch normalization

CNN in keras overfitting even after dropout and batch normalization - python

I am trying to train my model by finetuning a pretrained model(vggface). My model has 12 classes with 1774 training images and 313 validation images, each class having around 150 images.
My model was overfitting so I added dropout and FC layers with batch normalization to see how it goes. But still, the model overfits:
train_data_path = 'dataset_cfps/train'
validation_data_path = 'dataset_cfps/validation'
#Parametres
img_width, img_height = 224, 224
vggface = VGGFace(model='resnet50', include_top=False, input_shape=(img_width, img_height, 3))
last_layer = vggface.get_layer('avg_pool').output
x = Flatten(name='flatten')(last_layer)
xx = Dense(1024, activation = 'softmax')(x)
x2 = Dropout(0.5)(xx)
y = Dense(1024, activation = 'softmax')(x2)
yy = BatchNormalization()(y)
y1 = Dropout(0.5)(yy)
x3 = Dense(12, activation='softmax', name='classifier')(y1)
custom_vgg_model = Model(vggface.input, x3)
# Create the model
model = models.Sequential()
# Add the convolutional base model
model.add(custom_vgg_model)
model.summary()
model = load_model('facenet_resnet_lr3_SGD_relu_1024.h5')
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)
# Change the batchsize according to your system RAM
train_batchsize = 32
val_batchsize = 32
train_generator = train_datagen.flow_from_directory(
train_data_path,
target_size=(img_width, img_height),
batch_size=train_batchsize,
class_mode='categorical')
validation_generator = validation_datagen.flow_from_directory(
validation_data_path,
target_size=(img_width, img_height),
batch_size=val_batchsize,
class_mode='categorical',
shuffle=True)
# Compile the model
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-3),
metrics=['acc'])
# Train the model
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size ,
epochs=100,
validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=1)
# Save the model
model.save('facenet_resnet_lr3_SGD_relu_1024_1.h5')
Here are the epochs:
(type) Output Shape Param #
=================================================================
model_5 (Model) (None, 12) 26725324
=================================================================
Total params: 26,725,324
Trainable params: 26,670,156
Non-trainable params: 55,168
_________________________________________________________________
Found 1774 images belonging to 12 classes.
Found 313 images belonging to 12 classes.
.
.
.
Epoch 70/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5433 - acc: 0.8987 - val_loss: 0.8271 - val_acc: 0.7796
Epoch 71/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.9145 - val_loss: 0.7954 - val_acc: 0.7508
Epoch 72/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5353 - acc: 0.8955 - val_loss: 0.8690 - val_acc: 0.7348
Epoch 73/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5310 - acc: 0.9037 - val_loss: 0.8673 - val_acc: 0.7476
Epoch 74/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5189 - acc: 0.8943 - val_loss: 0.8701 - val_acc: 0.7380
Epoch 75/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5333 - acc: 0.8952 - val_loss: 0.9399 - val_acc: 0.7188
Epoch 76/100
56/55 [==============================] - 49s 879ms/step - loss: 0.5106 - acc: 0.9043 - val_loss: 0.8107 - val_acc: 0.7700
Epoch 77/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5108 - acc: 0.9064 - val_loss: 0.9624 - val_acc: 0.6869
Epoch 78/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5214 - acc: 0.8994 - val_loss: 0.9602 - val_acc: 0.6933
Epoch 79/100
56/55 [==============================] - 49s 880ms/step - loss: 0.5246 - acc: 0.9009 - val_loss: 0.8379 - val_acc: 0.7572
Epoch 80/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4859 - acc: 0.9082 - val_loss: 0.7856 - val_acc: 0.7796
Epoch 81/100
56/55 [==============================] - 49s 881ms/step - loss: 0.5005 - acc: 0.9175 - val_loss: 0.7609 - val_acc: 0.7827
Epoch 82/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4690 - acc: 0.9294 - val_loss: 0.7671 - val_acc: 0.7636
Epoch 83/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4897 - acc: 0.9146 - val_loss: 0.7902 - val_acc: 0.7636
Epoch 84/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4604 - acc: 0.9291 - val_loss: 0.7603 - val_acc: 0.7636
Epoch 85/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4750 - acc: 0.9220 - val_loss: 0.7325 - val_acc: 0.7668
Epoch 86/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4524 - acc: 0.9266 - val_loss: 0.7782 - val_acc: 0.7636
Epoch 87/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4643 - acc: 0.9172 - val_loss: 0.9892 - val_acc: 0.6901
Epoch 88/100
56/55 [==============================] - 49s 881ms/step - loss: 0.4718 - acc: 0.9177 - val_loss: 0.8269 - val_acc: 0.7380
Epoch 89/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4646 - acc: 0.9290 - val_loss: 0.7846 - val_acc: 0.7604
Epoch 90/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4433 - acc: 0.9341 - val_loss: 0.7693 - val_acc: 0.7764
Epoch 91/100
56/55 [==============================] - 49s 877ms/step - loss: 0.4706 - acc: 0.9196 - val_loss: 0.8200 - val_acc: 0.7604
Epoch 92/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4572 - acc: 0.9184 - val_loss: 0.9220 - val_acc: 0.7220
Epoch 93/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4479 - acc: 0.9175 - val_loss: 0.8781 - val_acc: 0.7348
Epoch 94/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4793 - acc: 0.9100 - val_loss: 0.8035 - val_acc: 0.7572
Epoch 95/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4329 - acc: 0.9279 - val_loss: 0.7750 - val_acc: 0.7796
Epoch 96/100
56/55 [==============================] - 49s 879ms/step - loss: 0.4361 - acc: 0.9212 - val_loss: 0.8124 - val_acc: 0.7508
Epoch 97/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4371 - acc: 0.9202 - val_loss: 0.9806 - val_acc: 0.7029
Epoch 98/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4298 - acc: 0.9149 - val_loss: 0.8637 - val_acc: 0.7380
Epoch 99/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4370 - acc: 0.9255 - val_loss: 0.8349 - val_acc: 0.7604
Epoch 100/100
56/55 [==============================] - 49s 880ms/step - loss: 0.4407 - acc: 0.9205 - val_loss: 0.8477 - val_acc: 0.7508

CNN deep networks need a huge data for training. You have a little dataset and the model is unable to generalize from this small dataset. You have two options
reduce the network size
increase the number of dataset
EDIT after comments on answer:
The model has some issues. You wouldn't use softmax for hidden layers.
If you want to overcome the over-fitting issue you would freeze the trained layers and train only new added layers. If the model still overfits, you may remove some of layers you have added or lower their number of units.

Related

Is it possible to use pre-trained BERT for CNN based image captioning

Is it possible to use pre-trained BERT or GPT transformers for image captioning task using CNN features not vision transformer
I tried the following code to use BERT as decoder but I got bad performance:
class TransformerDecoderBlock(layers.Layer):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.bert = transformers.TFBertModel.from_pretrained("bert-base-uncased")
self.out = layers.Dense(VOCAB_SIZE, activation="softmax")
def call(self, inputs, encoder_outputs, training, mask=None):
bert_output = self.bert(inputs, attention_mask=mask)[0]
bert_output = self.out(bert_output)
return bert_output
21/21 [==============================] - 277s 6s/step - loss: 84.5721 - acc: 0.2112 - val_loss: 54.3953 - val_acc: 0.5427
Epoch 2/100
21/21 [==============================] - 91s 4s/step - loss: 37.8627 - acc: 0.6323 - val_loss: 27.0907 - val_acc: 0.7696
Epoch 3/100
21/21 [==============================] - 92s 4s/step - loss: 20.8055 - acc: 0.8054 - val_loss: 16.3021 - val_acc: 0.8698
Epoch 4/100
21/21 [==============================] - 92s 4s/step - loss: 12.6519 - acc: 0.9003 - val_loss: 11.2835 - val_acc: 0.9197
Epoch 5/100
21/21 [==============================] - 92s 4s/step - loss: 8.2166 - acc: 0.9438 - val_loss: 8.3427 - val_acc: 0.9427
Epoch 6/100
21/21 [==============================] - 93s 4s/step - loss: 5.6399 - acc: 0.9688 - val_loss: 6.7389 - val_acc: 0.9550
Epoch 7/100
21/21 [==============================] - 96s 5s/step - loss: 3.9650 - acc: 0.9831 - val_loss: 17.6212 - val_acc: 0.9077
2023-01-24 12:00:34.517913: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8401
{'testlen': 670, 'reflen': 2129, 'guess': [670, 0, 0, 0], 'correct': [288, 0, 0, 0]}
ratio: 0.31470173790497197
metric = {'Bleu_1': 0.048707163144657444, 'Bleu_2': 7.429062176481851e-05, 'Bleu_3': 8.551606570428752e-06, 'Bleu_4': 2.9013797251043654e-06, 'ROUGE_L': 0.09645392003732325, 'CIDEr': 0.009139851909119048}

CNN Low test Accuracy

I am building a CNN model to classify images into 11 classes as follows: 'dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow'
and while training I get good accuracy and good validation accuracy
Epoch 1/20
131/131 [==============================] - 1012s 8s/step - loss: 1.8284 - accuracy: 0.3724 - val_loss: 1.4365 - val_accuracy: 0.5719
Epoch 2/20
131/131 [==============================] - 67s 511ms/step - loss: 1.3041 - accuracy: 0.5516 - val_loss: 1.1048 - val_accuracy: 0.6515
Epoch 3/20
131/131 [==============================] - 67s 510ms/step - loss: 1.1547 - accuracy: 0.6161 - val_loss: 1.0509 - val_accuracy: 0.6732
Epoch 4/20
131/131 [==============================] - 67s 510ms/step - loss: 1.0681 - accuracy: 0.6394 - val_loss: 1.0644 - val_accuracy: 0.6616
Epoch 5/20
131/131 [==============================] - 66s 505ms/step - loss: 1.0269 - accuracy: 0.6509 - val_loss: 1.0929 - val_accuracy: 0.6363
Epoch 6/20
131/131 [==============================] - 66s 506ms/step - loss: 1.0018 - accuracy: 0.6576 - val_loss: 0.9666 - val_accuracy: 0.6869
Epoch 7/20
131/131 [==============================] - 67s 507ms/step - loss: 0.9384 - accuracy: 0.6790 - val_loss: 0.8623 - val_accuracy: 0.7144
Epoch 8/20
131/131 [==============================] - 66s 505ms/step - loss: 0.9160 - accuracy: 0.6903 - val_loss: 0.8834 - val_accuracy: 0.7180
Epoch 9/20
131/131 [==============================] - 66s 502ms/step - loss: 0.8909 - accuracy: 0.6915 - val_loss: 0.8667 - val_accuracy: 0.7050
Epoch 10/20
131/131 [==============================] - 66s 503ms/step - loss: 0.8476 - accuracy: 0.7075 - val_loss: 0.8100 - val_accuracy: 0.7339
Epoch 11/20
131/131 [==============================] - 67s 509ms/step - loss: 0.8108 - accuracy: 0.7262 - val_loss: 0.8352 - val_accuracy: 0.7137
Epoch 12/20
131/131 [==============================] - 66s 506ms/step - loss: 0.7922 - accuracy: 0.7212 - val_loss: 0.8368 - val_accuracy: 0.7195
Epoch 13/20
131/131 [==============================] - 66s 505ms/step - loss: 0.7424 - accuracy: 0.7442 - val_loss: 0.8813 - val_accuracy: 0.7166
Epoch 14/20
131/131 [==============================] - 66s 503ms/step - loss: 0.7060 - accuracy: 0.7579 - val_loss: 0.8453 - val_accuracy: 0.7231
Epoch 15/20
131/131 [==============================] - 66s 503ms/step - loss: 0.6767 - accuracy: 0.7584 - val_loss: 0.8347 - val_accuracy: 0.7151
Epoch 16/20
131/131 [==============================] - 66s 506ms/step - loss: 0.6692 - accuracy: 0.7632 - val_loss: 0.8038 - val_accuracy: 0.7346
Epoch 17/20
131/131 [==============================] - 67s 507ms/step - loss: 0.6308 - accuracy: 0.7718 - val_loss: 0.7956 - val_accuracy: 0.7455
Epoch 18/20
131/131 [==============================] - 67s 508ms/step - loss: 0.6043 - accuracy: 0.7901 - val_loss: 0.8295 - val_accuracy: 0.7477
Epoch 19/20
131/131 [==============================] - 66s 506ms/step - loss: 0.5632 - accuracy: 0.8018 - val_loss: 0.7918 - val_accuracy: 0.7455
Epoch 20/20
131/131 [==============================] - 67s 510ms/step - loss: 0.5368 - accuracy: 0.8138 - val_loss: 0.7798 - val_accuracy: 0.7549
but when I predict and submit my results I get very low accuracy.
here is my model
from keras.preprocessing.image import ImageDataGenerator
IMG_SIZE = 50
datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.25)
train_dataset = datagen.flow_from_directory( directory=Train_folder,
shuffle=True,
target_size=(50,50),
subset="training",
classes=['dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow'],
class_mode='categorical')
validation_dataset = datagen.flow_from_directory( directory=Train_folder,
shuffle=True,
target_size=(50,50),
subset="validation",
classes=['dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow'],
class_mode='categorical')
Found 4168 images belonging to 11 classes.
Found 1383 images belonging to 11 classes.
model = Sequential([
layers.Conv2D(32, kernel_size=(3, 3),activation="relu",padding='same',input_shape=(IMG_SIZE, IMG_SIZE, 3)),
layers.MaxPooling2D((2, 2),padding='same'),
layers.Dropout(0.25),
layers.Conv2D(64, (3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2),padding='same'),
layers.Dropout(0.25),
layers.Conv2D(128, (3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2),padding='same'),
layers.Dropout(0.4),
layers.Flatten(),
layers.Dense(128, activation="relu"),
layers.Dropout(0.3),
layers.Dense(11, activation='softmax')
])
model.build()
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
history = model.fit(
train_dataset,
epochs=20,
validation_data=validation_dataset,
)
model.save('model.tfl')
Test_folder="/content/drive/MyDrive/[NN'22] Project Dataset/Test"
test_data = []
labels = []
for img in tqdm(os.listdir(Test_folder)):
path = os.path.join(Test_folder, img)
img_data2 = cv2.imread(path)
try:
img_data2 = cv2.resize(img_data2, (IMG_SIZE,IMG_SIZE))
except:
continue
test_data.append([np.array(img_data2)])
labels.append(img)
X_data=np.array([test_data]).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
prediction = model.predict([X_data])

How can I prevent my model from being overfitted?

I'm a newbie with deep learning and I try to create a model and I don't really understand the model. add(layers). I m sure that the input shape (it's for recognition). I think the problem is in the Dropout, but I don't understand the value.
Can someone explains to me the
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation = 'relu', input_shape = (128,128,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=1e-4), metrics=['acc'])
-------------------------------------------------------
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
)
and here is the result :
Epoch 15/30
5/5 [==============================] - 0s 34ms/step - loss: 0.3987 - acc: 0.8536 - val_loss: 0.7021 - val_acc: 0.7143
Epoch 16/30
5/5 [==============================] - 0s 31ms/step - loss: 0.3223 - acc: 0.8891 - val_loss: 0.6393 - val_acc: 0.7778
Epoch 17/30
5/5 [==============================] - 0s 32ms/step - loss: 0.3321 - acc: 0.9082 - val_loss: 0.6229 - val_acc: 0.7460
Epoch 18/30
5/5 [==============================] - 0s 31ms/step - loss: 0.2615 - acc: 0.9409 - val_loss: 0.6591 - val_acc: 0.8095
Epoch 19/30
5/5 [==============================] - 0s 32ms/step - loss: 0.2161 - acc: 0.9857 - val_loss: 0.6368 - val_acc: 0.7143
Epoch 20/30
5/5 [==============================] - 0s 33ms/step - loss: 0.1773 - acc: 0.9857 - val_loss: 0.5644 - val_acc: 0.7778
Epoch 21/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1650 - acc: 0.9782 - val_loss: 0.5459 - val_acc: 0.8413
Epoch 22/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1534 - acc: 0.9789 - val_loss: 0.5738 - val_acc: 0.7460
Epoch 23/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1205 - acc: 0.9921 - val_loss: 0.5351 - val_acc: 0.8095
Epoch 24/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0967 - acc: 1.0000 - val_loss: 0.5256 - val_acc: 0.8413
Epoch 25/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0736 - acc: 1.0000 - val_loss: 0.5493 - val_acc: 0.7937
Epoch 26/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0826 - acc: 1.0000 - val_loss: 0.5342 - val_acc: 0.8254
Epoch 27/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0687 - acc: 1.0000 - val_loss: 0.5452 - val_acc: 0.8254
Epoch 28/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0571 - acc: 1.0000 - val_loss: 0.5176 - val_acc: 0.7937
Epoch 29/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0549 - acc: 1.0000 - val_loss: 0.5142 - val_acc: 0.8095
Epoch 30/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0479 - acc: 1.0000 - val_loss: 0.5243 - val_acc: 0.8095
I never depassed the 70% average but on this i have 80% but i think i'm on overfitting.. I evidemently searched on differents docs but i'm lost

Have you try following into your training:
Data Augmentation
Pre-trained Model
Looking at the execution time per epoch, it looks like your data set is pretty small. Also, it's not clear whether there is any class imbalance in your dataset. You probably should try stratified CV training and analysis on the folds results. It won't prevent overfit but it will eventually give you more insight into your model, which generally can help to reduce overfitting. However, preventing overfitting is a general topic, search online to get resources. You can also try this
model.compile(loss='categorical_crossentropy',
optimizer='adam, metrics=['acc'])
-------------------------------------------------------
# src: https://keras.io/api/callbacks/reduce_lr_on_plateau/
# reduce learning rate by a factor of 0.2 if val_loss -
# won't improve within 5 epoch.
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.00001)
# src: https://keras.io/api/callbacks/early_stopping/
# stop training if val_loss don't improve within 15 epoch.
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
callbacks=[reduce_lr, early_stop]
)
You may also find it useful of using ModelCheckpoint or LearningRateScheduler. This doesn't guarantee of no overfit but some approach for that to adopt.

Validation Accuracy not improving CNN

I am fairly new to deep learning and right now am trying to predict consumer choices based on EEG data. The total dataset consists of 1045 EEG recordings each with a corresponding label, indicating Like or Dislike for a product. Classes are distributed as follows (44% Likes and 56% Dislikes). I read that Convolutional Neural Networks are suitable to work with raw EEG data so I tried to implement a network based on keras with the following structure:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(full_data, target, test_size=0.20, random_state=42)
y_train = np.asarray(y_train).astype('float32').reshape((-1,1))
y_test = np.asarray(y_test).astype('float32').reshape((-1,1))
# X_train.shape = ((836, 512, 14))
# y_train.shape = ((836, 1))
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.layers import MaxPooling1D
model = Sequential()
model.add(Conv1D(16, kernel_size=3, activation="relu", input_shape=(512,14)))
model.add(MaxPooling1D())
model.add(Conv1D(8, kernel_size=3, activation="relu"))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer=Adam(lr = 0.001), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size = 64)
When I fit the model however the validation accuracy does not change at all with the following output:
Epoch 1/20
14/14 [==============================] - 0s 32ms/step - loss: 292.6353 - accuracy: 0.5383 - val_loss: 0.7884 - val_accuracy: 0.5407
Epoch 2/20
14/14 [==============================] - 0s 7ms/step - loss: 1.3748 - accuracy: 0.5598 - val_loss: 0.8860 - val_accuracy: 0.5502
Epoch 3/20
14/14 [==============================] - 0s 6ms/step - loss: 1.0537 - accuracy: 0.5598 - val_loss: 0.7629 - val_accuracy: 0.5455
Epoch 4/20
14/14 [==============================] - 0s 6ms/step - loss: 0.8827 - accuracy: 0.5598 - val_loss: 0.7010 - val_accuracy: 0.5455
Epoch 5/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7988 - accuracy: 0.5598 - val_loss: 0.8689 - val_accuracy: 0.5407
Epoch 6/20
14/14 [==============================] - 0s 6ms/step - loss: 1.0221 - accuracy: 0.5610 - val_loss: 0.6961 - val_accuracy: 0.5455
Epoch 7/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7415 - accuracy: 0.5598 - val_loss: 0.6945 - val_accuracy: 0.5455
Epoch 8/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7381 - accuracy: 0.5574 - val_loss: 0.7761 - val_accuracy: 0.5455
Epoch 9/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7326 - accuracy: 0.5598 - val_loss: 0.6926 - val_accuracy: 0.5455
Epoch 10/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7338 - accuracy: 0.5598 - val_loss: 0.6917 - val_accuracy: 0.5455
Epoch 11/20
14/14 [==============================] - 0s 7ms/step - loss: 0.7203 - accuracy: 0.5610 - val_loss: 0.6916 - val_accuracy: 0.5455
Epoch 12/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7192 - accuracy: 0.5610 - val_loss: 0.6914 - val_accuracy: 0.5455
Epoch 13/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7174 - accuracy: 0.5610 - val_loss: 0.6912 - val_accuracy: 0.5455
Epoch 14/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7155 - accuracy: 0.5610 - val_loss: 0.6911 - val_accuracy: 0.5455
Epoch 15/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7143 - accuracy: 0.5610 - val_loss: 0.6910 - val_accuracy: 0.5455
Epoch 16/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7129 - accuracy: 0.5610 - val_loss: 0.6909 - val_accuracy: 0.5455
Epoch 17/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7114 - accuracy: 0.5610 - val_loss: 0.6907 - val_accuracy: 0.5455
Epoch 18/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7103 - accuracy: 0.5610 - val_loss: 0.6906 - val_accuracy: 0.5455
Epoch 19/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7088 - accuracy: 0.5610 - val_loss: 0.6906 - val_accuracy: 0.5455
Epoch 20/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7075 - accuracy: 0.5610 - val_loss: 0.6905 - val_accuracy: 0.5455
Thanks in advance for any insights!

The phenomenon you run into is called underfitting. This happens when the amount our quality of your training data is insufficient, or your network architecture is too small and not capable to learn the problem.
Try normalizing your input data and experiment with different network architectures, learning rates and activation functions.
As #Muhammad Shahzad stated in his comment, adding some Dense Layers after flatting would be a concrete architecture adaption you should try.

You can also increase the epoch and must increase the data set. And you also can use-
train_datagen= ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip = True,
channel_shift_range=0.2,
fill_mode='nearest'
)
for feeding the model more data and I hope you can increase the validation_accuracy.

Different converging for different training data representations (Numpy array and TensorFlow Dataset API)

I am using Keras with TensorFlow backend to train an LSTM network for some time-sequential data sets. The performance seems pretty good when I represent my training data (as well as the validation data) in the Numpy array format:
train_x.shape: (128346, 10, 34)
val_x.shape: (7941, 10, 34)
test_x.shape: (24181, 10, 34)
train_y.shape: (128346, 2)
val_y.shape: (7941, 2)
test_y.shape: (24181, 2)
P.s., 10 is the time steps and 34 is the number of features; The labels were one-hot encoded.
model = tf.keras.Sequential()
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True,
input_shape=(_TIME_STEPS, _FEATURE_DIMENTIONS)))
model.add(layers.Dropout(0.4))
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True))
model.add(layers.Dropout(0.3))
model.add(layers.TimeDistributed(layers.Dense(_NUM_CLASSES)))
model.add(layers.Flatten())
model.add(layers.Dense(_NUM_CLASSES, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr = _LR)
model.compile(optimizer = opt, loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
And the training results are:
Train on 128346 samples, validate on 7941 samples
Epoch 1/10
128346/128346 [==============================] - 50s 390us/step - loss: 0.5883 - acc: 0.6975 - val_loss: 0.5242 - val_acc: 0.7416
Epoch 2/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4804 - acc: 0.7687 - val_loss: 0.4265 - val_acc: 0.8014
Epoch 3/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4232 - acc: 0.8076 - val_loss: 0.4095 - val_acc: 0.8096
Epoch 4/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3894 - acc: 0.8276 - val_loss: 0.3529 - val_acc: 0.8469
Epoch 5/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3610 - acc: 0.8430 - val_loss: 0.3283 - val_acc: 0.8593
Epoch 6/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3402 - acc: 0.8525 - val_loss: 0.3334 - val_acc: 0.8558
Epoch 7/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3233 - acc: 0.8604 - val_loss: 0.2944 - val_acc: 0.8741
Epoch 8/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3087 - acc: 0.8663 - val_loss: 0.2786 - val_acc: 0.8805
Epoch 9/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2969 - acc: 0.8709 - val_loss: 0.2785 - val_acc: 0.8777
Epoch 10/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2867 - acc: 0.8757 - val_loss: 0.2590 - val_acc: 0.8877
This log seems pretty normal, but when I tried to use TensorFlow Dataset API to represent my data sets, the training process performed very strange (it seems that the model turns to overfit/underfit?):
def tfdata_generator(features, labels, is_training = False, batch_size = _BATCH_SIZE, epoch = _EPOCH):
dataset = tf.data.Dataset.from_tensor_slices((features, tf.cast(labels, dtype = tf.uint8)))
if is_training:
dataset = dataset.shuffle(10000) # depends on sample size
dataset = dataset.batch(batch_size, drop_remainder = True).repeat(epoch).prefetch(batch_size)
return dataset
training_set = tfdata_generator(train_x, train_y, is_training=True)
validation_set = tfdata_generator(val_x, val_y, is_training=False)
testing_set = tfdata_generator(test_x, test_y, is_training=False)
Training on the same model and hyperparameters:
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
And the log seems much different from the previous one:
Epoch 1/10
2005/2005 [==============================] - 54s 27ms/step - loss: 0.1451 - acc: 0.9419 - val_loss: 3.2980 - val_acc: 0.4975
Epoch 2/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1675 - acc: 0.9371 - val_loss: 3.0838 - val_acc: 0.4975
Epoch 3/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1821 - acc: 0.9316 - val_loss: 3.1212 - val_acc: 0.4975
Epoch 4/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1902 - acc: 0.9287 - val_loss: 3.0032 - val_acc: 0.4975
Epoch 5/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1905 - acc: 0.9283 - val_loss: 2.9671 - val_acc: 0.4975
Epoch 6/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1867 - acc: 0.9299 - val_loss: 2.8734 - val_acc: 0.4975
Epoch 7/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1802 - acc: 0.9316 - val_loss: 2.8651 - val_acc: 0.4975
Epoch 8/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1740 - acc: 0.9350 - val_loss: 2.8793 - val_acc: 0.4975
Epoch 9/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1660 - acc: 0.9388 - val_loss: 2.7894 - val_acc: 0.4975
Epoch 10/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1613 - acc: 0.9405 - val_loss: 2.7997 - val_acc: 0.4975
The validation loss could not be reduced and the val_acc always the same value when I use the TensorFlow Dataset API to represent my data.
My questions are:
Based on the same model and parameters, why the model.fit() provides such different training results when I merely adopted tf.data.Dataset API?
What the difference between these two mechanisms?
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
vs
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
How to solve this strange problem if I have to use tf.data.Dataset API?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CNN in keras overfitting even after dropout and batch normalization - python

Related

Is it possible to use pre-trained BERT for CNN based image captioning

CNN Low test Accuracy

How can I prevent my model from being overfitted?

Validation Accuracy not improving CNN

Different converging for different training data representations (Numpy array and TensorFlow Dataset API)

Categories

Resources