ResNet always predicting one class - python

I am trying to do transfer learning in Keras + Tensorflow on a selected subset of Places-205 dataset, containing only 27 categories. I am using InceptionV3, DenseNet121 and ResNet50, pre-trained on ImageNet, and add a couple of extra layers to adapt to my classes. If the model is ResNet, I add Flatten + Dense for classfication, and if it is DenseNet or Inceptionv3, I add Global Avg Pool + Dense (relu) + Dense (classification).
This is the code snippet:
x = base_model.output
if FLAGS.model in 'resnet50':
x = Flatten(name="flatten")(x)
else:
x = GlobalAveragePooling2D()(x)
# Let's add a fully-connected layer
x = Dense(1024, activation = 'relu')(x)
# And a logistic layer
predictions = Dense(classes, activation = 'softmax')(x)
For DenseNet and Inceptionv3 the training is ok, and the validation accuracy hits 70%, but for ResNet the validation accuracy stays fixed at 0.0369/0.037 (which is 1/27, my number of classes). It seems like it always predicts one class, but it's weird because its training progresses ok and the unspecific model code is exactly the same as for DenseNet and InceptionV3, which do work as expected.
Do you have any idea why it happens?
Thanks a lot!

I had a similar issue as you #Ciprian Andrei Focsaneanu, and what I have found to have worked was to make the previous layers (before the fully connected layers) trainable, as the filters/features of the ResNet50 were not suitable for my application.
Strangely enough, I also trained the VGG16 models, which was initially on the same images (imagenet) but its filters worked for my application, but I digress.
Here's the link to a page that inspired me to do this: https://datascience.stackexchange.com/questions/16840/multi-class-neural-net-always-predicting-1-class-after-optimization
Hope this helps!

Related

When fine-tuning a pre-trained Model, how does tensorflow know that the base_model has been changed?

Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False
They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does
Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
input_shape = image_shape + (3,0)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tfl.GlobalAveragePooling2D()(x)
x = tfl.Dropout(0.2)(x)
prediction_layer = tfl.Dense(1)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
This performs OK, getting as much as 80% accuracy
Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:
fine_tune_at = 120
base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True
print("#/layers in base model: ", len(base_model.layers))
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = ['accuracy']
fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs
Up until this point, I'm satisfied, I can clearly see what is going on, but then:
model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)
This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.
So given all of that context, the question is: How is altering the base_model affecting model2?
If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time
Your
totally separate question, but I would love to hear in comments, what this does exactly.
The following list get the MobileNetV2 model:
base_model = model2.layers[4]
Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.
How is altering the base_model affecting model2
During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.
To view the full model summary with nested models, use:
>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)
I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.
I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.
Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.

Extracting features from EfficientNet Tensorflow

I have a CNN model trained using EfficientNetB6.
My task is to extract the features of this trained model by removing the last dense layer and then using those weights to train a boosting model.
i did this using Pytorch earlier and was able to extract the weights from the layers i was interested and predicted on my validation set and then boosted.
I am doing this now in tensorflow but currently stuck.
Below is my model structure and I have tried using the code on the website but did not had any luck.
I want to remove the last dense layer and predict on the validation set using the remaining layers.
I tried using :
layer_name = 'efficientnet-b6'
intermediate_layer_model = tf.keras.Model(inputs = model.input, outputs = model.get_layer(layer_name).output)
but i get an error "
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(None, 760, 760, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []"
Any way to resolve this?
Sorry my bad.
I simply added a GlobalAveragePooling2D layer after the efficientnet layer and i am able to extract the weights and continue :)
just for reference:
def build_model(dim=CFG['net_size'], ef=0):
inp = tf.keras.layers.Input(shape=(dim,dim,3))
base = EFNS[ef](input_shape=(dim,dim,3),weights='imagenet',include_top=False)
x = base(inp)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(1,activation='sigmoid')(x)
model = tf.keras.Model(inputs=inp,outputs=x)
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
loss = tf.keras.losses.BinaryCrossentropy(label_smoothing=0.05)
model.compile(optimizer=CFG['optimizer'],loss=loss,metrics=[tf.keras.metrics.AUC(name='auc')])
print(model.summary())
return model

How can i improve my CNN's accuracy evolution?

So, i'm trying to create a CNN which can predict if there is any "support devices" in a x-ray thorax image, but when training my model it seems it's not learning anything.
I'm using a dataset called "CheXpert" which has over 200.000 images to use. After doing some "cleaning", the final dataset ended up with 100.000 images.
As far as the model is concerned, i imported the convolutional base of the vgg16 pretrained model and added by my self 2 fully conected layers. Then, i freezed all the convolutional base and make only trainable the fully conected layers. Here's the code:
from keras.layers import GlobalAveragePooling2D
from keras.models import Model
pretrained_model = VGG16(weights='imagenet', include_top=False)
pretrained_model.summary()
for layer in pretrained_model.layers:
layer.trainable = False
x = pretrained_model.output
x = GlobalAveragePooling2D()(x)
dropout = Dropout(0.25)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
x = dropout(x)
x = Dense(1024, activation = 'relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
final_model = Model(inputs=pretrained_model.input, outputs=predictions)
final_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
As far as i know, the normal behavior should be that the accuracy should start low and then grow up with the epochs. But here it only oscillates through the same values (0.93 and 0.95). I'm sorry i cannot upload images to show you the graphs.
To sum up, i want to know if that little variance in the accuracy means that the model is not learning anything.
I have an hypothesis: from all the 100.000 images of the dataset, 95.000 have the label "1" and only 5.000 have the label "0". I think that if diminish the images with "1" equate them with the images with "0" the results would change.
The lack of images labeled "0" doesn't help the CNN for sure. I also suggest to lower the learning rate and play around with the batch size to see if something changes.
I wish it helps.
Because of imbalance training data, I suggest that you can set "class_weight" during the training step. The more data you have, the lower class_weight you set.
class_weight = {0: 1.5, 1: 0.5}
model.fit(X, Y, class_weight=class_weight)
You can check the augment of class_weight in keras document.
class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.

Adding Dropout to testing/inference phase

I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?
As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])
your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.
As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.

Keras: freezing layers during training does not give consistent output

I am trying to fine-tune a model using keras, according to this description: https://keras.io/applications/#inceptionv3
However, during training I discovered that the output of the network does not remain constant after training when using the same input (while all relevant layers were frozen), which I do not want.
I constructed the following toy example to investigate this:
import keras.applications.resnet50 as resnet50
from keras.layers import Dense, Flatten, Input
from keras.models import Model
from keras.utils import to_categorical
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
# data
i = np.random.rand(1,224,224,3)
X = np.random.rand(32,224,224,3)
y = to_categorical(np.random.randint(751, size=32), num_classes=751)
# model
base_model = resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224,224,3)))
layer = base_model.output
layer = Flatten(name='myflatten')(layer)
layer = Dense(751, activation='softmax', name='fc751')(layer)
model = Model(inputs=base_model.input, outputs=layer)
# freeze all layers
for layer in model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# features and predictions before training
feat0 = base_model.predict(i)
pred0 = model.predict(i)
weights0 = model.layers[-1].get_weights()
# before training output is consistent
feat00 = base_model.predict(i)
pred00 = model.predict(i)
print(np.allclose(feat0, feat00)) # True
print(np.allclose(pred0, pred00)) # True
# train
model.fit(X, y, batch_size=2, epochs=3, shuffle=False)
# features and predictions after training
feat1 = base_model.predict(i)
pred1 = model.predict(i)
weights1 = model.layers[-1].get_weights()
# these are not the same
print(np.allclose(feat0, feat1)) # False
# Optionally: printing shows they are in fact very different
# print(feat0)
# print(feat1)
# these are not the same
print(np.allclose(pred0, pred1)) # False
# Optionally: printing shows they are in fact very different
# print(pred0)
# print(pred1)
# these are the same and loss does not change during training
# so layers were actually frozen
print(np.allclose(weights0[0], weights1[0])) # True
# Check again if all layers were in fact untrainable
for layer in model.layers:
assert layer.trainable == False # All succeed
# Being overly cautious also checking base_model
for layer in base_model.layers:
assert layer.trainable == False # All succeed
Since I froze all layers i fully expect both the predictions and both the features to be equal, but surprisingly they aren't.
So probably I am making some kind of mistake, but I can't figure what.. Any suggestions would be greatly appreciated!
So the problem seems to be that the model uses batch normalization layers, which do update their internal state (i.e. their weights) based on the seen data during training. This even happens when their trainable flag have been set to False. And as their weights are thus updated, the output also changes. You can check this by using the code in the question and changing the following codelines:
This weights0 = model.layers[-1].get_weights()
to weights0 = model.layers[2].get_weights()
and this weights1 = model.layers[-1].get_weights()
to weights1 = model.layers[2].get_weights()
or the index of any other batch normalization layer.
Because then the following assertion will no longer hold:
print(np.allclose(weights0, weights1)) # Now this is False
As far as I am aware there is currently no solution for this yet..
See also my issue on Keras' Github page.
One more reason for unstable training could be since you are using a very small batch size, i.e., batch_size=2. At least, use batch_size=32. This value is too small for the batch normalization to compute reliably the estimation of the training distribution statistics (mean and variance). These mean and variance values are then used to normalize first the distribution and followed by learning of beta and gamma parameters (actual distribution).
Check the following links for more details:
In the introduction and related works, the authors criticized BatchNorm and do check figure 1: https://arxiv.org/pdf/1803.08494.pdf
Nice article on "Curse of Batch Norm": https://towardsdatascience.com/curse-of-batch-normalization-8e6dd20bc304

Categories

Resources