Recently, I have built a simple convolutional neural network for hand gesture image recognition using background subtraction to make the hand a white shape on the screen with a black background. It was built using keras Conv2D for the most part. My dataset has 1000 pics for training and 100 pics for validation and testing. The problem oddly occurs immediately after the first epoch, during which the model's loss goes down a great deal. It usually goes down from some big number like 183 to 1 at the start of the second epoch. All the pics from the dataset are from my own hand using cv2, but I only conducted testing with my own hand, so that should not be any problem. In case the dataset was the problem, I have tried to take 3 different datasets, one using cv2's Canny method, which essentially traces a line of the hand and makes the rest of the pic black to see if that made a difference. Regardless, the same thing continued to happen. Furthermore, I have added multiple Dropout layers in different places to see the effect and the same thing always occurs in which the loss drastically decreases and it shows signs of overfitting. I have also implemented EarlyStopping and multiple layers to see if that helped, but the same results seems to always occurs.
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu',
input_shape = (240, 215, 1)))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(256, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
#model.add(Conv2D(256, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(128, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(64, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(150, activation = 'relu'))
#model.add(Dropout(0.25))
#model.add(Dense(1000, activation = 'relu'))
model.add(Dropout(0.75))
model.add(Dense(6, activation = 'softmax'))
model.summary()
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['acc'])
callbacks_list = [EarlyStopping(monitor = 'val_loss', patience = 10),
ModelCheckpoint(filepath = 'model.h6', monitor = 'val_loss',
save_best_only = True),]
The commented sections of the code are changes I have tried to implement. I have also varied the Dropout values and positions of them a great deal and nothing significant has changed. Could anyone offer any advice on why my model overfits that quickly?
Yes, it is a clear case of overfitting. Here are my suggestions:
Try reducing the hidden layers
Increase the drop out to 0.5
Create more synthetic images or apply transformations on the raw images.
When dealing with such a massive overfitting phenomenon, a good starting point would be to reduce your number of layers.
Although you add a Dropout after many max-poolings, you still suffer from the overfitting phenomenon.
Here below I present some of my recommendations:
Ensure that you have a comprehensive dataset with clean labels.
Regardless of how we might want to tune the neural network, if the
dataset is not clean, we cannot obtain good results.
Add (for the beginning), maximum 3 stacks of convolution + max_pooling + dropout. (32 + 64 + 128) would be a good starting
point.
Use GlobalAveragePooling2D instead of Dense layers. The latter are not needed in a Convolutional Neural Network, except for
the last layer with sigmoid or softmax.
Try using
SpatialDropout2D. As compared to typical Dropout, which is
applied to each element in the feature map, SpatialDropout drops entire feature maps.
Try to use Data Augmentation. In this way, you create more artificial examples and your network will be less prone to overfitting.
If none of these work, ensure that you use a pre-trained network and you apply transfer learning to your task at hand.
Related
Let's say I have a dataset containing many time series of a leg-worn accelerometer sensor. Each time series has peaks and the peaks correspond to jumps made by the person that is wearing the sensor. I now want to use the data for training a convolutional neural network in order to make it predict, how many jumps were performed in a certain time series.
The problem I am having is that the CNN should work for any number of peaks/jumps. Obviously, it is impossible to generate a training dataset that provides training samples for any possible number of jumps/peaks, since the dataset would have to be infinitely large then. However, as far as I know, in mutliclass classification, the final layer of a CNN must contain as many nodes as there are possible outcomes.
How should the final layer be designed in order to predict any possible number of peaks between 0 and infinity? Is this even possible?
As an example, find my very basic CNN setup here:
model = keras.Sequential()
model.add(Conv1D(filters=32, kernel_size=2, activation = 'relu', strides = 1, padding = 'same', input_shape=(3200, 1)))
model.add(Dropout(0.3))
model.add(Conv1D(filters=64, kernel_size=2, activation = 'relu', strides = 1, padding = 'same'))
model.add(Flatten())
model.add(Dense(32, activation = 'relu'))
model.add(Dense(?, activation='softmax')) # ? represents the infinite number of output units I am asking about in the question
Hereby is my code: for CNN training on image recognition
python
# definiton of code
def make_model():
model = Sequential()
model.add(Conv2D(16, (3,3),input_shape = (32,32,3), padding = "same",
kernel_initializer="glorot_uniform"))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(32, (3,3),input_shape = (32,32,3), padding = "same",
kernel_initializer="glorot_uniform"))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size = (2,2),padding = "same"))
model.add(Dropout(0.25))*
model.add(Conv2D(32,(3,3), input_shape = (32,32,3), padding = "same"))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(64, (3,3),input_shape = (32,32,3), padding = "same"))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size = (2,2),padding = "same"))
model.add(Dropout(0.25))
*layer*
model.add(Flatten())
model.add(Dense(256))
*for activation*
model.add(LeakyReLU(alpha=0.1))
model.add(Dropout(0.5))
model.add(Dense(10))
*for activation*
model.add(LeakyReLU(alpha=0.1))
model.add(Activation("softmax"))
And then it stuck around with the result which freak me out:
loss: 7.4918; acc: 0.1226.
I have been trying few more way but I don't know exactly what I should do for the right path.
Without details of the problem, it is difficult to investigate more.
But I would encourage you to look more into :
BatchNormalization
loss function
learning rate
optimizer
hidden layers
The current state of the art is to apply convolution together with Batch Normalization and ReLU activation. The order should be the following:
Convolution
Batch Normalization
ReLU(It could also be leaky ReLu or any other activation)
So you should add BN after your convolutions and then you should also remove DropOut. It has been studied by many researchers that Dropout is not needed if BN is used and BN performs actually better.
Other than this you should probably play around with the parameters like learning rate, number of filters and etc.
Also make sure that you are using a correct loss and a output activation corresponding your loss.
This is part of a code on construction of a CNN in a book.
I don't understand why 'filters =64' here. As far as I know this is the number of feature maps. How do I determine this number when I make my own CNN?
# network parameters
# image is processed as is (square grayscale)
input_shape = (image_size, image_size, 1)
batch_size = 128
kernel_size = 3
pool_size = 2
filters = 64
dropout = 0.2
model = Sequential()
model.add(Conv2D(filters = filters,
kernel_size = kernel_size,
activation = 'relu',
input_shape = input_shape))
model.add(MaxPooling2D(pool_size))
model.add(Conv2D(filters = filters,
kernel_size = kernel_size,
activation = 'relu'))
model.add(MaxPooling2D(pool_size))
model.add(Conv2D(filters = filters,
kernel_size = kernel_size,
activation = 'relu'))
model.add(Flatten())
# dropout added as regularizer
model.add(Dropout(dropout))
# output layer is 10-dim one-hot vector
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.summary()
plot_model(model, to_file='cnn-mnist.png', show_shapes=True)
Filters are the number of features you want to detect in the image. Also known as the feature detectors in the image. It is a hyper-parameter and entirely up to you.
There are a couple of architectures of CNN. It will be better if you look for existing solutions on the problem you are trying to solve using your CNN. Then try to tune filter value and check if accuracy increases or not.
You can choose filters based on the complexity of the task. The number of filters tends to increase with each layer. As first layer will extract simple features, and the next layer will extract more complex features. Have a look at the link below for further reference. Hope I helped :)
https://stackoverflow.com/a/48243420/9024042
I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
Overexposure
Oversaturation
Incorrect white balance
Blurriness
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
Examples:
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Josh
EDIT:
Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)
Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that
with three different images, my results are 0.987 bad and 0.999 good
and
I was giving you the predictions from the model previously
you should use a softmax activation, i.e.
model.add(Dense(classes))
model.add(Activation("softmax"))
Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.
model.add(Dense(1))
model.add(Activation("sigmoid"))
The latter is usually preferred in binary classification settings, but the results should be the same in principle.
UPDATE (after updating the question):
sparse_categorical_crossentropy is not the correct loss here, either.
All in all, try the following changes:
model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])
# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))
with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).
I suggest you go for transfer learning instead of training the whole network.
use the weights trained on a huge Dataset like ImageNet
you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.
and you can train the model as usual way.
Demo code -
from keras.applications import *
from keras.models import Model
base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
layer.trainable = False
So I've been building a convolutional neural network. I'm trying to predict whether a boardgame state (10x10 matrix) will lead to a win (binary 0 or 1) or not.
I have six million examples, which you would think would be enough, but clearly not, as my network is predicting all of one class...
Is there something obvious I'm missing? I tried giving it even 10 examples and it still predicts them all as the same class.
The input matrices are 10x10 of integers.
Input reshaping:
x_train = x_train.reshape(len(x_train),10,10,1)
Actual model building:
model = Sequential()
model.add(Conv2D(3, kernel_size=(1, 1), strides=(1, 1), activation='relu', input_shape=(10,10,1)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(500, activation='tanh'))
model.add(Dropout(0.5))
model.add(keras.layers.Dense(75, activation='relu'))
model.add(BatchNormalization())
model.add(keras.layers.Dense(10, activation='sigmoid'))
model.add(keras.layers.Dense(1,kernel_initializer='normal',activation='sigmoid'))
optimizerr = keras.optimizers.SGD(lr=0.001, momentum=0.9, decay=0.01, nesterov=True)
model.compile(optimizer=optimizerr, loss='binary_crossentropy', metrics=[metrics.binary_accuracy])
model.fit(x_train, y_train,epochs = 100, batch_size = 128, verbose=1)
I've tried modifying the learning rate, momentum, decay, the kernel_sizes, layer types, sizes... I checked for dying relu and that didn't seem to be the problem. Removing the dropout/batch normalization layers (or various random layers) didn't do anything either.
The data have roughly 53/47% split across the labels, so it's not that either.
I'm more confused because even when I ask it to predict the train set, it STILL insists on only labeling things one class, even if there are only ~20 samples or fewer.