Training VGG16 from scratch doesn't improve accuracy in Keras

Training VGG16 from scratch doesn't improve accuracy in Keras - python

I'm trying to train VGG16 models using both transfer learning and training from scratch. I have a dataset with 7k images per category, and 4 different categories. I managed to come up with the transfer learning code no problem, however, the same program but for training from scratch does not seem to be working.
creating the model for transfer learning:
base_model = apps.VGG16(
include_top=False, # This is if we want the final FC layers
weights="imagenet",
input_shape=input_shape,
classifier_activation="softmax",
pooling = pooling,
)
# Freeze the base model
for layer in base_model.layers:
layer.trainable = False
# convert output of base model to a 1D vector
x = Flatten()(base_model.output)
# We create fc_count fully connected layers, relu for all but the last
x = Dense(units=4096, activation='relu')(x) # relu avoids vanishing gradient problem
x = Dense(units=4096, activation='relu')(x) # relu avoids vanishing gradient problem
# The final layer is a softmax layer
prediction = Dense(4, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=prediction)
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate=0.001),
metrics=['accuracy'])
Meanwhile, for training from scratch:
model = apps.VGG16(
include_top=True, # This is if we want the final FC layers
weights=None,
input_shape=input_shape,
classifier_activation="softmax",
pooling = pooling,
classes = 4 # set the number of outputs to required count
)
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate=0.1), # I've experimented w values as low as 0.001
metrics=['accuracy'])
model.summary()
and the training is done via
history = model.fit(train_images,
validation_data=val_images,
epochs=epochs,
verbose=1, callbacks=callbacks)
Transfer learning takes around 10 epochs to converge, whereas I've gone up to 20 epochs when training from scratch, converging to an accuracy and val_accuracy of exactly 0.2637. I have a ReduceLROnPlateau that does make a difference when transfer learning.
I'm training on a NVIDIA GeForce RTX 3060 Laptop GPU.
EDIT: I should mention that I am getting loss of nan when training from scratch

Problem got resolved by switching to the SGD optimizer

Related

Why is my transfer learning implementation of VGG19 not improving accuracy?

I want to use the pretrained VGG19 (with imagenet weights) to build a two class classifier using a dataset of about 2.5k images that i've curated and split into 2 classes. It seems that not only is training taking a very long time, but accuracy seems to not increase in the slightest.
Here's my implementation:
def transferVGG19(train_dataset, val_dataset):
# conv_model = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
conv_model = VGG19(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=(224, 224, 3),
pooling=None,
classes=1000,
classifier_activation="softmax",
)
for layer in conv_model.layers:
layer.trainable = False
input = layers.Input(shape=(224, 224, 3))
scale_layer = layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(input)
x = conv_model(x, training=False)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(1, activation='softmax')(x)
full_model = models.Model(inputs=input, outputs=predictions)
full_model.summary()
full_model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['acc'])
history = full_model.fit(
train_dataset,
epochs=10,
validation_data=val_dataset,
workers=10,
)
Model performance seems to be awful...
I imagine this behaviour comes from my rudimentary understanding of how layers work and how to best the new model's architecture. As VGG19 is trained on 1000 classes, i saw it best fit to add to the output a couple of dense layers to reduce the size of the feature maps, as well as a dropout layer in between to randomly discard neurons and help ease the risk of overfitting. At first i suspected i might have dropped too many neurons, but i was expecting my network to learn slower rather than not at all.
Is there something obviously wrong in my implementation that would cause such poor performance? Any explanation is welcomed. Just to mention, i would rule out the dataset as an issue because i've implemented transfer learning on Xception and have managed to get 98% validation accuracy that was monotonously increasing over 20 epochs. That implementation used different layers (i can provide it if necessary) because i was experimenting with different network layouts.

TLDR; Change include_top= True to False
Explaination-
Model graphs are represented in inverted manner i.e last layers are shown at the top and initial layers are shown at bottom.
When include_top=False, the top dense layers which are used for classification and not representation of data are removed from the pretrained VGG model. Only till the last conv2D layers are preserved.
During transfer-learning, you need to keep the learned representation layers intact and only learn the classification part for your data. Hence you are adding your stack of classification layers i.e.
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(1, activation='softmax')(x)
If you keep the top classification layers of VGG, it will give 1000 probabilities for 1000 classes due to softmax activation at its top layer in model graph.This activation is not relu. We dont need softmax in intermediate layer as softmax "squishes" the unscaled inputs so that sum(input) = 1. Effectively it produces a smooth software defined approximation of argmax. Hence your accuracy is suffering.

Multi class image classification using CNN

I wanted to classify images which consist five classes. I wanted to use CNN. But when I try with several models, the training accuracy will not increase than 20%. Please some one help me to overcome this. Mostly model will trained within 3 epoches and when epoches increase there is no improvement in accuracy. Can anyone suggest me a solution or model or can specify what could be the problem?
Below is one of the model i have used
#defining training and test sets
x_train,x_val,y_train,y_val=train_test_split(x,y,test_size=0.2, random_state=42)
print('Training data and target sizes: \n{}, {}'.format(x_train.shape,y_train.shape))
print('Test data and target sizes: \n{}, {}'.format(x_val.shape,y_val.shape))
Training data and target sizes:
(2398, 224, 224, 3), (2398,)
Test data and target sizes:
(600, 224, 224, 3), (600,)
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
print(base_model.summary())
#Adding custom Layers
add_model = Sequential()
add_model.add(Dense(1024, activation='relu',input_shape=base_model.output_shape[1:]))
add_model.add(Dropout(0.60))
add_model.add(Dense(1, activation='sigmoid'))
print(add_model.summary())
# creating the final model
model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
# compile the model
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
reduce_lr = ReduceLROnPlateau(monitor='val_acc',
patience=5,
verbose=1,
factor=0.1,
cooldown=10,
min_lr=0.00001)
model.compile(
loss='categorical_crossentropy',
metrics=['acc'],
optimizer='adam'
)
print(model.summary())
n_fold = 5
kf = model_selection.KFold(n_splits = n_fold, shuffle = True)
eval_fun = metrics.roc_auc_score
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))

is it okay could you share the part of the code where you're fitting the model. It's not available in the post.
And since the output is not reproducible due to lack of data, I suggest you go through this link https://www.kaggle.com/kenconstable/alzheimer-s-multi-class-classification
It's really well explained and it has given the best practices of multi-class-classification based on transfer learning as well as from scratch. In case you don't find this helpful, It would be helpful to share the training script including the model.fit() code.
Okay, so here's the issue,
In your code, you may be creating a base model with inception V3, however, you are not really adding that base model to your add_model variable.
Your add_model variable is essentially a dense network and not a CNN. Also, another thing, although it's not a big deal is that you're creating your own optimiser opt and not using it in model.compile
Can you please try this code out and let me know if it works:
# function to build the model
def build_transfer_model(conv_base,dropout,dense_node,learn_rate,metric):
"""
Build and compile a transfer learning model
Input: a base model, dropout rate, the number of filters in the dense node,
the learning rate and performance metrics
Output: A compiled CNN model
"""
# clear previous run
backend.clear_session()
# build the model
model = Sequential()
model.add(conv_base)
model.add(Dropout(dropout))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(dense_node,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# complile the model
model.compile(
optimizer = tensorflow.keras.optimizers.Adam(lr=learn_rate),
loss = 'categorical_crossentropy',
metrics = metric )
model.summary()
return model
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
model = build_transfer_model(conv_base=base_model,dropout=0.6,dense_node =1024,learn_rate=0.001,metric=['acc'])
print(model.summary())
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))
If you pay attention in the function, the first thing we are adding to the instance of Sequential() is the base layer (InceptionV3 in your case). But you were adding a dense layer directly. Although it may get the weights from the output layer of the base inception V3, it will be a dense network, not a CNN. So please check this out.
I may have changed the variable names, although I have tried not to do the same. And, please change the order of the layers in the build_transfer_model function according to your requirement.
In case it doesn't work, let me know.
Thanks.

You have to use model.fit() to actually train the model after compiling. Right now, it has randomly initialized weights, and is therefore making random predictions. Since you have five classes, the accuracy is approximately 1/5 = 20%. Training your model may take time depending on model size and amount of data you have.

Good training/validation accuracy but poor test accuracy

Ive trained a model to classify 4 types of eye diseases using the VGG16 pretrained model. I am fairly new to machine learning so didn't know what to make out of the results.
After training it for about 6 hours on 90,000 images:
training accuracy kept increasing as well as the loss (went from roughly 2 to 0.8 ended with an accuracy of 88%)
validation loss kept flucating between 1-2 per epoch (accuracy did improve to 85%)
(I accidentally reran the cell so cant see the output)
After looking at the confusion matrix, it seems my test isn't performing well
Image_height = 196
Image_width = 300
val_split = 0.2
batches_size = 10
lr = 0.0001
spe = 512
vs = 32
epoch = 10
#Creating batches
#Creating batches
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="training")
validation_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input,validation_split=val_split) \
.flow_from_directory(directory=train_folder, target_size=(Image_height,Image_width), classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical",
subset="validation")
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(test_folder, target_size=(Image_height,Image_width),
classes=['CNV','DME','DRUSEN','NORMAL'], batch_size=batches_size,class_mode="categorical")
#Function to create model. We will be using a pretrained model
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
model.add(vgg16_model)
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model
model = create()
model.compile(Adam(lr=lr),loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(train_batches, steps_per_epoch=spe,
validation_data=validation_batches,validation_steps=vs, epochs=epoch)
Any suggestions on what I can improve on so the confusion matrix isn't doing so poorly? I also have the model saved if its possible to just retrain it with more layers.

A number of issues and recommendations. You are using VGG16 model. That model has over 40 million trainable parameters. On a data set of 90,000 images your training time will be very long. So I recommend you consider using the MobileNet model. It only has 4 million trainable parameters and is essentially just as accurate as VGG16. Documentation is [here.][1] Next irrespective of which model you use you should set the initial weights to the imagenet weights. Your model will start off trained on images.I find I get better results by making all layers in the model trainable. Now you say your model reached an accuracy of 88%. I do not think that is very good. I believe you need to achieve at least 95%. You can do that by using an adjustable learning rate. The keras callback ReduceLROnPlateau makes doing that easy. Documentation is [here.][2] Set it up to monitor validation loss and reduce the learning rate if it fails to decrease on consecutive epochs. Next you want to save the model that has the lowest validation loss and use that to make predictions. The Keras callback ModelCheckpoint can be set up to monitor validation loss and save the model with the lowest loss. Documentation is [here.][3] .
Code below shows how to implement the MobileNet model for your problem and define the callbacks. You will also have to make changes to the generator to use Mobilenet preprocessing and set target size to (224,224). Also I believe you are missing () around the pre-processing function Hope this helps..
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(224, 224,3),
pooling='max', weights='imagenet',
alpha=1, depth_multiplier=1,dropout=.5)
x=mobile.layers[-1].output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
predictions=Dense (4, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=0, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
callbacks=[checkpoint, lr_adjust]
[1]: http://httphttps://keras.io/api/applications/mobilenet/s://
[2]: https://keras.io/api/callbacks/reduce_lr_on_plateau/
[3]: https://keras.io/api/callbacks/model_checkpoint/

You don't train any layer except the last one.
You need to set the training capability to the last few or add more layers.
Add
tf.keras.applications.VGG16(... weights='imagenet'... )
In your code, the weights are not pretrained on any set.
The available options are explained here:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16

while adding layers to model you have to remove last dense layer of the model, as your model has four classes but vgg16 has 1000 classes so you have to remove last dense layer then add your own dense layers:
def create():
vgg16_model = keras.applications.vgg16.VGG16(input_tensor=Input(shape=(Image_height, Image_width, 3)),input_shape=(Image_height,Image_width,3), include_top = False)
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
model.summary()
for layer in model.layers:
layer.trainable = False
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
return model

Vgg16 for gender detection (male,female)

We have used vgg16 and freeze top layers and retrain the last 4 layers on gender dataset 12k male and 12k female. It gives very low accuracy especially for male. We are using the IMDB dataset. On female test data it gives female as output but on male it gives same output.
vgg_conv=VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze the layers except the last 4 layers
for layer in vgg_conv.layers[:-4]:
layer.trainable = False
Create the model
model = models.Sequential()
Add the vgg convolutional base model
model.add(vgg_conv)
Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5)) model.add(layers.Dense(2, activation='softmax'))
nTrain=16850 nTest=6667
train_datagen = image.ImageDataGenerator(rescale=1./255)
test_datagen = image.ImageDataGenerator(rescale=1./255)
batch_size = 12 batch_size1 = 12
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=batch_size, class_mode='categorical', shuffle=False)
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224, 224), batch_size=batch_size1, class_mode='categorical', shuffle=False)
model.compile(optimizer=optimizers.RMSprop(lr=1e-6), loss='categorical_crossentropy', metrics=['acc'])
history = model.fit_generator( train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=3, validation_data=test_generator, validation_steps=test_generator.samples/test_generator.batch_size, verbose=1)
model.save('gender.h5')
Testing Code:
model=load_model('age.h5')
img=load_img('9358807_1980-12-28_2010.jpg', target_size=(224,224))
img=img_to_array(img)
img=img.reshape((1,img.shape[0],img.shape[1],img.shape[2]))
img=preprocess_input(img)
yhat=model.predict(img)
print(yhat.size)
label=decode_predictions(yhat)
label=label[0][0]
print('%s(%.2f%%)'% (label[1],label[2]*100))

Firstly, you are saving the model as gender.h5 and during testing you are loading the model age.h5. Probably you have added different code for the testing here.
Coming to improving the accuracy of the program -
Most importantly is that you are using loss = 'categorical_crossentropy', change it to loss = 'binary_crossentropy' in model.compile as you have just 2 classes. So your
model.compile(optimizer="adam",loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy']) will look like this.
Also change class_mode='categorical' to class_mode='binary' in flow_from_directory.
As categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid. So last layer should be Dense(1, activation='sigmoid').
You have added 2 Dense layers of 4096, this will add 4096 * 4096 = ‭16,777,216‬ weights to be learnt by the model. Reduce them may be to 1026 and 512 respectively.
You have added Dropout layer of 0.5, that is to keep the 50% of neurons off during the epoch. That is huge number. Better is to drop off the Dropout layer and use to only if your model is overfitting.
Set batch_size = 1. As you have very less input let every epoch have same number of steps as input records.
Use Data Augmentation technique like horizontal_flip, vertical_flip, shear_range, zoom_range of ImageDataGenerator to generate the new batches of training and validation images during every epoch.
Train your model for large number of epoch. You are just training for epoch=3, that is too less for learning the weights. Train for epoch=50 and later trim the number.
Hope this answers your question. Happy Learning.

Validation accuracy is low and not increasing while training accuracy is increasing

I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.