I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
Overexposure
Oversaturation
Incorrect white balance
Blurriness
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
Examples:
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Josh
EDIT:
Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)
Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that
with three different images, my results are 0.987 bad and 0.999 good
and
I was giving you the predictions from the model previously
you should use a softmax activation, i.e.
model.add(Dense(classes))
model.add(Activation("softmax"))
Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.
model.add(Dense(1))
model.add(Activation("sigmoid"))
The latter is usually preferred in binary classification settings, but the results should be the same in principle.
UPDATE (after updating the question):
sparse_categorical_crossentropy is not the correct loss here, either.
All in all, try the following changes:
model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])
# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))
with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).
I suggest you go for transfer learning instead of training the whole network.
use the weights trained on a huge Dataset like ImageNet
you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.
and you can train the model as usual way.
Demo code -
from keras.applications import *
from keras.models import Model
base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
layer.trainable = False
Related
I was wondering whether creating the model by passing activity_regularizer='l1_l2' as an argument to Conv2D()
model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
activity_regularizer='l1_l2', input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid'))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
will mathematically make a difference to creating the model by adding model.add(ActivityRegularization(l1=..., l2=...)) seperately?
model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(ActivityRegularization(l1=some_number, l2=some_number))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid'))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
For me, it is hard to tell, as training always involves some randomness. But the results seem similar.
One additional question I have is: I accidentally passed the activity_regularizer='l1_l2' argument to the MaxPooling2D() layer before, and the code ran. How can that be, considering that activity_regularizer is not given as a possible argument for MaxPooling2D() in tensorflow?
Technically, if you are not applying any other constraint on the layer output, applying the activity regularizer inside the layer as well as outside the convolution layer is same. However, applying it outside the convolution layer gives the user more flexibility. For instance, the user might want to regularize the output units after the skip connections are set up instead of after the convolution. It is just like to have an activation function inside the convolution layer or using keras.activations to use the activations after he convolution layer. Sometimes this is done after batch normalization.
For your second question, the MaxPool2D layer takes the activity regularizer constraint. Even though this is not mentioned in their documentation, it kind of makes sense intuitionally, since the user might want to regularize the outputs after max-pooling. You can check that activity_regularizer does not only work with the MaxPool2D layer but also with other layers such as the BatchNormalization layer for the same reason.
tf.keras.layers.Conv2D(64 , 2 , padding='same', activity_regularizer='l1_l2')
and this code,
tf.keras.layers.Conv2D(64 , 2 , padding='same')
tf.keras.layers.ActivityRegularization()
They both do the same job, actually doing inside or outside has the same impact. Moreover, Tensorflow on the backend makes a graph of it which will first apply the ConvLayer then it will apply the Activity-Regularization, in both cases, the computation shall be done in the same way with no difference...
Hereby is my code: for CNN training on image recognition
python
# definiton of code
def make_model():
model = Sequential()
model.add(Conv2D(16, (3,3),input_shape = (32,32,3), padding = "same",
kernel_initializer="glorot_uniform"))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(32, (3,3),input_shape = (32,32,3), padding = "same",
kernel_initializer="glorot_uniform"))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size = (2,2),padding = "same"))
model.add(Dropout(0.25))*
model.add(Conv2D(32,(3,3), input_shape = (32,32,3), padding = "same"))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(64, (3,3),input_shape = (32,32,3), padding = "same"))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size = (2,2),padding = "same"))
model.add(Dropout(0.25))
*layer*
model.add(Flatten())
model.add(Dense(256))
*for activation*
model.add(LeakyReLU(alpha=0.1))
model.add(Dropout(0.5))
model.add(Dense(10))
*for activation*
model.add(LeakyReLU(alpha=0.1))
model.add(Activation("softmax"))
And then it stuck around with the result which freak me out:
loss: 7.4918; acc: 0.1226.
I have been trying few more way but I don't know exactly what I should do for the right path.
Without details of the problem, it is difficult to investigate more.
But I would encourage you to look more into :
BatchNormalization
loss function
learning rate
optimizer
hidden layers
The current state of the art is to apply convolution together with Batch Normalization and ReLU activation. The order should be the following:
Convolution
Batch Normalization
ReLU(It could also be leaky ReLu or any other activation)
So you should add BN after your convolutions and then you should also remove DropOut. It has been studied by many researchers that Dropout is not needed if BN is used and BN performs actually better.
Other than this you should probably play around with the parameters like learning rate, number of filters and etc.
Also make sure that you are using a correct loss and a output activation corresponding your loss.
I have built and tested two convolutional Neural Network models (VGG-16 and 3-layer CNN) to predict classification of lung CT scans for COVID-19.
Prior to the classification, I've performed image segmentation via k-means clustering on images to try to improve the classification performance.
The segmented images look like below.
And I've trained and evaluated VGG-16 model on both segmented images and raw images separately. And lastly, trained and evaluated a 3-layer CNN on the segmented images only. Below is the results for their train/validation loss and accuracy.
For the simple 3-layer CNN model, I can clearly see that the model is trained well and also it starts to overfit once epochs are over 2. But, I don't understand how validation accuracy of the VGG model doesn't look like an exponential curve instead it looks like a horizontally straight line or a fluctuating horizontal line.
And besides, the simple 3-layer CNN models seems to perform better. Is this due to gradient vanishing in VGG model ? Or the image itself is simple that deep architecture doesn't benefit?
I'd appreciate if you could share your knowledge on such learning behaviour of the models.
This is the code for the VGG-16 model:
# build model
img_height = 256
img_width = 256
model = Sequential()
model.add(Conv2D(input_shape=(img_height,img_width,1),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=1, activation="sigmoid"))
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
And this is a code for the 3-layer CNN.
# build model
model2 = Sequential()
model2.add(Conv2D(32, 3, padding='same', activation='relu',input_shape=(img_height, img_width, 1)))
model2.add(MaxPool2D())
model2.add(Conv2D(64, 5, padding='same', activation='relu'))
model2.add(MaxPool2D())
model2.add(Flatten())
model2.add(Dense(128, activation='relu'))
model2.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.001)
model2.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
Thank you!
Looking at the accuracies for an assumed to be binary problem you can observe that the model is just random guessing (acc ~ 0.5).
The fact that your 3-layer model gives much better results on the train set indicates that you are not training long enough to overfit.
In addition you do not seem to use a proper initalization of the NN. Note: at the beginning of an implementation process overfitting is indicating that implementation training just works fine. Hence it is a good thing in this phase.
Therefore, first step would be to get the model overfitting. You seem to train from scratch. In that case it can take a few 100 epochs until the gradients impact the first convolutions on a complex model like VGG16.
As the 3Layer CNN seems to overfit quite heavily I conclude that your dataset is rather small.
Hence, I would recommend to start from a pre-trained model (VGG16) and just re-train the last two layers. This should give much better result.
As per what #CAFEBABE suggested, I have tried two approaches. First, I have increased epochs size to 200, changed optimiser to SGD and reduced learning rate down to 1e-5.
And second, I have implemented pre-trained weights for the VGG-16 model and only trained the last two convolutional layers. Below is the plot displaying the tuned VGG-16 model, the pre-trained VGG-16 model and the 3-layer CNN model (from top to bottom).
Certainly, tuning had an effect on the performance but it was very marginal. I guess the learnable features from the dataset with ~600 images were not sufficient enough to train the model. And the pre-trained model significantly benefitted the model reaching overfitting at ~25 epochs. However, in comparion with the 3-layer CNN model, the testing accuracies of these two models are similar ranging between 0.7 and 0.8. I guess this is again due to the limitation of the datasets.
Thanks again to #CAFEBABE for helping my problem and I hope this can help other people who might face similar problem as I did.
Recently, I have built a simple convolutional neural network for hand gesture image recognition using background subtraction to make the hand a white shape on the screen with a black background. It was built using keras Conv2D for the most part. My dataset has 1000 pics for training and 100 pics for validation and testing. The problem oddly occurs immediately after the first epoch, during which the model's loss goes down a great deal. It usually goes down from some big number like 183 to 1 at the start of the second epoch. All the pics from the dataset are from my own hand using cv2, but I only conducted testing with my own hand, so that should not be any problem. In case the dataset was the problem, I have tried to take 3 different datasets, one using cv2's Canny method, which essentially traces a line of the hand and makes the rest of the pic black to see if that made a difference. Regardless, the same thing continued to happen. Furthermore, I have added multiple Dropout layers in different places to see the effect and the same thing always occurs in which the loss drastically decreases and it shows signs of overfitting. I have also implemented EarlyStopping and multiple layers to see if that helped, but the same results seems to always occurs.
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu',
input_shape = (240, 215, 1)))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(256, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
#model.add(Conv2D(256, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(128, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(64, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(150, activation = 'relu'))
#model.add(Dropout(0.25))
#model.add(Dense(1000, activation = 'relu'))
model.add(Dropout(0.75))
model.add(Dense(6, activation = 'softmax'))
model.summary()
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['acc'])
callbacks_list = [EarlyStopping(monitor = 'val_loss', patience = 10),
ModelCheckpoint(filepath = 'model.h6', monitor = 'val_loss',
save_best_only = True),]
The commented sections of the code are changes I have tried to implement. I have also varied the Dropout values and positions of them a great deal and nothing significant has changed. Could anyone offer any advice on why my model overfits that quickly?
Yes, it is a clear case of overfitting. Here are my suggestions:
Try reducing the hidden layers
Increase the drop out to 0.5
Create more synthetic images or apply transformations on the raw images.
When dealing with such a massive overfitting phenomenon, a good starting point would be to reduce your number of layers.
Although you add a Dropout after many max-poolings, you still suffer from the overfitting phenomenon.
Here below I present some of my recommendations:
Ensure that you have a comprehensive dataset with clean labels.
Regardless of how we might want to tune the neural network, if the
dataset is not clean, we cannot obtain good results.
Add (for the beginning), maximum 3 stacks of convolution + max_pooling + dropout. (32 + 64 + 128) would be a good starting
point.
Use GlobalAveragePooling2D instead of Dense layers. The latter are not needed in a Convolutional Neural Network, except for
the last layer with sigmoid or softmax.
Try using
SpatialDropout2D. As compared to typical Dropout, which is
applied to each element in the feature map, SpatialDropout drops entire feature maps.
Try to use Data Augmentation. In this way, you create more artificial examples and your network will be less prone to overfitting.
If none of these work, ensure that you use a pre-trained network and you apply transfer learning to your task at hand.
So I've been building a convolutional neural network. I'm trying to predict whether a boardgame state (10x10 matrix) will lead to a win (binary 0 or 1) or not.
I have six million examples, which you would think would be enough, but clearly not, as my network is predicting all of one class...
Is there something obvious I'm missing? I tried giving it even 10 examples and it still predicts them all as the same class.
The input matrices are 10x10 of integers.
Input reshaping:
x_train = x_train.reshape(len(x_train),10,10,1)
Actual model building:
model = Sequential()
model.add(Conv2D(3, kernel_size=(1, 1), strides=(1, 1), activation='relu', input_shape=(10,10,1)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(500, activation='tanh'))
model.add(Dropout(0.5))
model.add(keras.layers.Dense(75, activation='relu'))
model.add(BatchNormalization())
model.add(keras.layers.Dense(10, activation='sigmoid'))
model.add(keras.layers.Dense(1,kernel_initializer='normal',activation='sigmoid'))
optimizerr = keras.optimizers.SGD(lr=0.001, momentum=0.9, decay=0.01, nesterov=True)
model.compile(optimizer=optimizerr, loss='binary_crossentropy', metrics=[metrics.binary_accuracy])
model.fit(x_train, y_train,epochs = 100, batch_size = 128, verbose=1)
I've tried modifying the learning rate, momentum, decay, the kernel_sizes, layer types, sizes... I checked for dying relu and that didn't seem to be the problem. Removing the dropout/batch normalization layers (or various random layers) didn't do anything either.
The data have roughly 53/47% split across the labels, so it's not that either.
I'm more confused because even when I ask it to predict the train set, it STILL insists on only labeling things one class, even if there are only ~20 samples or fewer.