Transfer learning using saved bottleneck values (inference with full model) - python

I'm using lastest Keras with tensorflow backend.
I'm not quite sure the correct way to put together the full model for inference, if I used a smaller version of my model for training on bottleneck values.
# Save bottleneck values
from keras.applications.xception import Xception
base_model = Xception(weights='imagenet', include_top=False)
prediction = base_model.predict(x)
** SAVE bottleneck data***
Now let's say my full model looks something like this:
base_model = Xception(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(classes, activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)
but to speed up training, I wanted to bypass the earlier layers by loading bottleneck values; so I create a smaller model (including only the new layers). I then train and save the model.
bottleneck_input = Input(shape = bottleneck_shape)
x = GlobalAveragePooling2D() (bottleneck_input)
x = Dense(1024, activation='relu')(x)
predictions = Dense(classes, activation='softmax')(x)
model = Model(input= bottleneck_input, output=predictions)
save_full_model() #save model
after training this smaller model, I want to run inference on the full model. So I need to put together the base model and the smaller model. Not sure what is the best way to to do this.
base_model = Xception(weights='imagenet', include_top=False)
#x = base_model.output
loaded_model = load_model() # load bottleneck model
#now to combine both models (something like this?)
Model(inputs = base_model.inputs, outputs = loaded_model.outputs)
What is the proper way to put together the model for inference?
I don't know if there is a way to use my full-model for training, and just start from the bottleneck layers for training and input layer for inference. (Please not this is not the same as freeze layers, which just freezes the weights (weights won't be updated), but still calculates each data point.)

Every model is a layer with extra properties such as loss function etc. So you can use them like a layer in the functional API. In your case it could look like:
input = Input(...)
base_model = Xception(weights='imagenet', include_top=False)
# Apply model to input like layer
base_output = base_model(input)
loaded_model = load_model()
# Now the bottleneck model
out = loaded_model(base_output)
final_model = Model(input, out) # New computation graph

Related

Why is my transfer learning implementation of VGG19 not improving accuracy?

I want to use the pretrained VGG19 (with imagenet weights) to build a two class classifier using a dataset of about 2.5k images that i've curated and split into 2 classes. It seems that not only is training taking a very long time, but accuracy seems to not increase in the slightest.
Here's my implementation:
def transferVGG19(train_dataset, val_dataset):
# conv_model = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
conv_model = VGG19(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=(224, 224, 3),
pooling=None,
classes=1000,
classifier_activation="softmax",
)
for layer in conv_model.layers:
layer.trainable = False
input = layers.Input(shape=(224, 224, 3))
scale_layer = layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(input)
x = conv_model(x, training=False)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(1, activation='softmax')(x)
full_model = models.Model(inputs=input, outputs=predictions)
full_model.summary()
full_model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['acc'])
history = full_model.fit(
train_dataset,
epochs=10,
validation_data=val_dataset,
workers=10,
)
Model performance seems to be awful...
I imagine this behaviour comes from my rudimentary understanding of how layers work and how to best the new model's architecture. As VGG19 is trained on 1000 classes, i saw it best fit to add to the output a couple of dense layers to reduce the size of the feature maps, as well as a dropout layer in between to randomly discard neurons and help ease the risk of overfitting. At first i suspected i might have dropped too many neurons, but i was expecting my network to learn slower rather than not at all.
Is there something obviously wrong in my implementation that would cause such poor performance? Any explanation is welcomed. Just to mention, i would rule out the dataset as an issue because i've implemented transfer learning on Xception and have managed to get 98% validation accuracy that was monotonously increasing over 20 epochs. That implementation used different layers (i can provide it if necessary) because i was experimenting with different network layouts.
TLDR; Change include_top= True to False
Explaination-
Model graphs are represented in inverted manner i.e last layers are shown at the top and initial layers are shown at bottom.
When include_top=False, the top dense layers which are used for classification and not representation of data are removed from the pretrained VGG model. Only till the last conv2D layers are preserved.
During transfer-learning, you need to keep the learned representation layers intact and only learn the classification part for your data. Hence you are adding your stack of classification layers i.e.
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(64, activation='relu')(x)
predictions = layers.Dense(1, activation='softmax')(x)
If you keep the top classification layers of VGG, it will give 1000 probabilities for 1000 classes due to softmax activation at its top layer in model graph.This activation is not relu. We dont need softmax in intermediate layer as softmax "squishes" the unscaled inputs so that sum(input) = 1. Effectively it produces a smooth software defined approximation of argmax. Hence your accuracy is suffering.

Get output of any layer from a saved Keras model [duplicate]

I am developing an autoencoder for clustering certain groups of images.
input_images->...->bottleneck->...->output_images
I have calibrated the autoencoder to my satisfaction and saved the model; everything has been developed using keras.tensorflow on python3.
The next step is to apply the autoencoder to a ton of images and cluster them according to cosine distance in the bottleneck layer. Oops, I just realized that I don't know the syntax in keras.tf for running the model on a batch up to a specific layer rather than to the output layer. Thus the question:
How do I run something like Model.predict_on_batch or Model.predict_generator up to the certain "bottleneck" layer and retrieve the values on that layer rather than the values on the output layer?
You need to define a new model (if you didn't define the encoder and decoder as separate models initially, which is usually the easiest option).
If your model was defined without reusing layers, it's just:
inputs = model.input
outputs= model.get_layer('bottleneck').output
encoder = Model(inputs, outputs)
Use the encoder model as any other model.
The full code would be like this,
# ENCODER
encoding_dim = 37310
input_layer = Input(shape=(encoding_dim,))
encoder = Dense(500, activation='tanh')(input_layer)
encoder = Dense(100, activation='tanh')(encoder)
encoder = Dense(50, activation='tanh', name='bottleneck_layer')(encoder)
decoder = Dense(100, activation='tanh')(encoder)
decoder = Dense(500, activation='tanh')(decoder)
decoder = Dense(37310, activation='sigmoid')(decoder)
# full model
model_full = models.Model(input_layer, decoder)
model_full.compile(optimizer='adam', loss='mse')
model_full.fit(x, y, epochs=20, batch_size=16)
# bottleneck model
bottleneck_output = model_full.get_layer('bottleneck_layer').output
model_bottleneck = models.Model(inputs = model_full.input, outputs = bottleneck_output)
bottleneck_predictions = model_bottleneck.predict(X_test)

Adding fully connected layers after pretrained model

I'm new to ConvNets and Python and want to implement the following:
I want to use the pretrained vgg16 model and add 3 fully connected layers after it with an L2-Normalization at the end.
So Data->VGG16->FC (1x4096)->FC (1x4096)->FC (1x3)->L2-Norm->Output
The first and second FC get an array 1x4096 the last FC gets an array 1x3 where the L2-Norm is performed.
Can anyone give me a hint how to do that ?
I found that I can load the model like that :
model_vgg19 = models.vgg19(pretrained=True)
But how can I add the FCs and the L2-Norm after that ? And how can I get Test-Data through the model ?
I'm quoting an example mentioned in Keras#3465
In Keras framework, if you mention include_top = False while loading your pre-trained model it will not include the final classification layer. You can add your custom FC layers at the end as shown in the example below:
#load vgg16 without dense layer and with theano dim ordering
base_model = VGG16(weights = 'imagenet', include_top = False, input_shape = (3,224,224))
#number of classes in your dataset e.g. 20
num_classes = 20
x = Flatten()(base_model.output)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.5)(x)
x = BatchNormalization()(x)
predictions = Dense(num_classes, activation = 'softmax')(x)
#create graph of your new model
head_model = Model(input = base_model.input, output = predictions)
#compile the model
head_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
head_model.summary()
.
.
.
#train your model on data
head_model.fit(x, y, batch_size = batch_size, verbose = 1)

Sudden spike in validation loss

So I am doing binary image classification on a small data set containing 250 images in each class, I am using transfer learning using Resnet50 as base network architecture and over it I've added 2 hidden layer and one final output layer, after training for 20 epochs, what I've saw is that loss is suddenly increases in initial epoch, I am unable to understand the reason behind it.
Network architecture -
image_input = Input(shape=(224, 224, 3))
model = ResNet50(input_tensor=image_input,include_top=True, weights='imagenet')
last_layer = model.get_layer('avg_pool').output
x = Flatten(name='flatten')(last_layer)
x = Dense(1000, activation='relu', name='fc1000')(x)
x = Dropout(0.5)(x)
x = Dense(200, activation='relu', name='fc200')(x)
x = Dropout(0.5)(x)
out = Dense(num_classes, activation='softmax', name='output')(x)
custom_model = Model(image_input, out)
I am using binary_crossentropy, Adam with default parameters
Loss -
Accuracy -
With such small class of data, there is definitely chance of overfitting do increase your dataset size and check it out use data augmentation if possible

Defining model in Keras

I am new to Deep learning and Keras. What does pretrained weights initialization weights='imagenet' mean when used to define a model in Keras?
ResNet50(weights='imagenet')
Thanks!
This code line creates a network architecture known by the name ResNet50 (you can find more information about it here). The weights='imagenet' makes Keras load the weights of this network, which has been trained on the imagenet data set. Without this information Keras would only be able to prepare the network architecture but would not be able to set any of the weights to "good" values, as it does not know the purpose of the model. This is determined by specifying the data set.
If you are using an other data set, then you are using the model as a pre-trained model. You can find more information about this technique here; but the general idea is: after a model has been trained on any complex (image) data set, it will have learned in its lowest layers (most of the time: convolutions) to detect very basic features, such as edges, corners, etc. This helps the model to learn to analyze your own data set much faster, as it does not have to learn to detect this basic features again.
Following #FlashTek answer we can also train this model on our dataset.
Look at the following code:
model = applications.ResNet50(weights = "imagenet", include_top=False,
input_shape = (img_width, img_height,3))
# Freeze the layers which you don't want to train. Here I am freezing the first 30 layers.
for layer in model.layers[0:30]:
layer.trainable = False
for layer in model.layers[30:]:
layer.trainable = True
#Adding custom Layers
x = Flatten()(model.output)
# x = Dense(1024, activation="relu")(x)
# x = Dropout(0.5)(x)
# x = Dense(1024, activation="relu")(x)
# x = Dropout(0.5)(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(2, activation="softmax")(x)
In the above code we can are specifying how many layer of resnet we have to train on our dataset by assigning layer.trainable either true to train it on your dataset or false for otherwise.
Apart of that we can also stick layer after the network as shown in Adding custom layers

Categories

Resources