Overfitting problem with my validation data

Overfitting problem with my validation data - python

I am applying CNN model using keras. I fed the details coefficients of discrete wavelet transform level 5 as a 2D array of size (5,3840) into the CNN.I would like to use CNN to predict seizure.The problem is my network is overfitting. Any suggestion on how to solve overfitting problem.
input_shape=(1, 22, 5, 3844)
model = Sequential()
#C1
model.add(Conv3D(16, (22, 5, 5), strides=(1, 2, 2), padding='same',activation='relu',data_format= "channels_first", input_shape=input_shape))
model.add(keras.layers.MaxPooling3D(pool_size=(1, 2, 2),data_format= "channels_first", padding='same'))
model.add(BatchNormalization())
#C2
model.add(Conv3D(32, (1, 3, 3), strides=(1, 1,1), padding='same',data_format= "channels_first", activation='relu'))#incertezza se togliere padding
model.add(keras.layers.MaxPooling3D(pool_size=(1,2, 2),data_format= "channels_first", ))
model.add(BatchNormalization())
#C3
model.add(Conv3D(64, (1,3, 3), strides=(1, 1,1), padding='same',data_format= "channels_first", activation='relu'))#incertezza se togliere padding
model.add(keras.layers.MaxPooling3D(pool_size=(1,2, 2),data_format= "channels_first",padding='same' ))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(256, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
opt_adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=opt_adam, metrics=['accuracy'])
return model

There are 2 frequently used regularization techniques to avoid over-fitting:
L1 & L2 Regularization : Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Dropout : Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent
over-fitting.
from keras.layers import Dropout
model.add(Dense(60, input_dim=60, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(30, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(1, activation='sigmoid'))
Also you can use Early-Stopping to interrupt training when the validation loss isn't decreasing anymore
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(x, y, validation_split=0.2, callbacks=[early_stopping])
Additionally, you might wanna consider Data-Augmentation techniques such as cropping, padding, and horizontal flipping. With these techniques, you can increase the diversity of your data available for your training model, without actually collecting new data. So you can capture data invariance and reduce over-fitting
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
steps_per_epoch=len(x_train) / 32, epochs=epochs)

Steps to remove overfitting:
Reduce the number of neural units in your hidden layers
I do not think you need softmax layer after a sigmoid layer. Probably your model is overfitting because of that.
Try replacing sigmoid layer with a dense layer with relu activation and output (n, 2) followed by your softmax layer.
Your learning rate is very low as well, which suggests that your model will take long to find the global minima hence underfit, but that is not happening here. This solidifies my suspicion that sigmoid layer is the cause.

Related

Is passing activity_regularizer as argument to Conv2D() the same as passing it seperately right after Conv2D()? (Tensorflow)

I was wondering whether creating the model by passing activity_regularizer='l1_l2' as an argument to Conv2D()
model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
activity_regularizer='l1_l2', input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid'))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
will mathematically make a difference to creating the model by adding model.add(ActivityRegularization(l1=..., l2=...)) seperately?
model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(ActivityRegularization(l1=some_number, l2=some_number))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid'))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
For me, it is hard to tell, as training always involves some randomness. But the results seem similar.
One additional question I have is: I accidentally passed the activity_regularizer='l1_l2' argument to the MaxPooling2D() layer before, and the code ran. How can that be, considering that activity_regularizer is not given as a possible argument for MaxPooling2D() in tensorflow?

Technically, if you are not applying any other constraint on the layer output, applying the activity regularizer inside the layer as well as outside the convolution layer is same. However, applying it outside the convolution layer gives the user more flexibility. For instance, the user might want to regularize the output units after the skip connections are set up instead of after the convolution. It is just like to have an activation function inside the convolution layer or using keras.activations to use the activations after he convolution layer. Sometimes this is done after batch normalization.
For your second question, the MaxPool2D layer takes the activity regularizer constraint. Even though this is not mentioned in their documentation, it kind of makes sense intuitionally, since the user might want to regularize the outputs after max-pooling. You can check that activity_regularizer does not only work with the MaxPool2D layer but also with other layers such as the BatchNormalization layer for the same reason.

tf.keras.layers.Conv2D(64 , 2 , padding='same', activity_regularizer='l1_l2')
and this code,
tf.keras.layers.Conv2D(64 , 2 , padding='same')
tf.keras.layers.ActivityRegularization()
They both do the same job, actually doing inside or outside has the same impact. Moreover, Tensorflow on the backend makes a graph of it which will first apply the ConvLayer then it will apply the Activity-Regularization, in both cases, the computation shall be done in the same way with no difference...

How to build a 1D CNN

I am trying to use a CNN for classification. My training data is shown in the picture below and has 9923 pieces of data with each piece containing 1k numeric values.
My current model has only around 10 percent accuracy and I am wondering if anyone knows if I am doing something wrong.
model = Sequential()
model.add(Conv1D(64,3, activation ='relu', input_shape= (1000, 1)))
model.add(MaxPooling1D(2))
model.add(Conv1D(64,3, activation ='relu'))
model.add(MaxPooling1D(pool_size=(2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(28, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs = 30, validation_split = 0.1)

Validation data performing worse than training data in keras

I am training a CNN on some text data. The sentences are padded and embedded and fed to a CNN. The model architecture is:
model = Sequential()
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Conv1D(128, 5, activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(50, activation = 'relu'))
model.add(BatchNormalization())
model.add(Dense(50, activation = 'relu'))
model.add(BatchNormalization())
model.add(Dense(25, activation = 'relu'))
#model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
Any help would be appreciated.

You model is over-fitting so the best practice is:
add layers and preferably that goes in the power of 2
instead of
model.add(Dense(50, activation = 'relu'))
use
model.add(Dense(64, activation = 'relu'))
and go with 512 128 64 32 16
add some dropout layers preferably after two layers.
train on bigger data.

You can try removing BatchNormalization and adding more Convolutional and Pooling Layer that may increase your accuracy.
You can also check out this -:
https://forums.fast.ai/t/batch-normalization-with-a-large-batch-size-breaks-validation-accuracy/7940

CNN architecture: classifying "good" and "bad" images

I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
Overexposure
Oversaturation
Incorrect white balance
Blurriness
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
Examples:
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Josh
EDIT:
Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)

Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that
with three different images, my results are 0.987 bad and 0.999 good
and
I was giving you the predictions from the model previously
you should use a softmax activation, i.e.
model.add(Dense(classes))
model.add(Activation("softmax"))
Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.
model.add(Dense(1))
model.add(Activation("sigmoid"))
The latter is usually preferred in binary classification settings, but the results should be the same in principle.
UPDATE (after updating the question):
sparse_categorical_crossentropy is not the correct loss here, either.
All in all, try the following changes:
model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])
# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))
with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).

I suggest you go for transfer learning instead of training the whole network.
use the weights trained on a huge Dataset like ImageNet
you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.
and you can train the model as usual way.
Demo code -
from keras.applications import *
from keras.models import Model
base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
layer.trainable = False

Keras CNN model for image classification does not generalize well

I want to implement a model in keras for sentiment classification(anger or non anger) based on spectograms. I have generated the spectograms using the audio dataset from Friends. Each spectogram has a length of 8 seconds. In total, I have 9117 train samples, 1006 validation samples and 2402 test samples.
I use a relatively simple CNN architecture and I tried different combinations of it + optimizer + learning rate + batch size but none of the results seem to generalize well...The loss decreases nicely till a certain point but the validation loss increases by each epoch.
This is the model I am using:
model = Sequential()
model.add(Convolution2D(filters=32, kernel_size=3, strides=1,input_shape=input_shape, activation='relu', padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(filters=64, kernel_size=3, strides=1, activation='relu', padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Convolution2D(filters=128, kernel_size=3, strides=1, activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(classes, activation='sigmoid')) #output layer
This is how I load the images:
img_rows = 120
img_cols = 160
train_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
SPECTOGRAMS_DIRECTORY + TRAIN_SUBDIR,
target_size=(img_cols, img_rows),
batch_size=batch_size,
class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(
SPECTOGRAMS_DIRECTORY + VALIDATION_SUBDIR,
target_size=(img_cols, img_rows),
batch_size=batch_size,
class_mode='binary')
test_generator = test_datagen.flow_from_directory(
SPECTOGRAMS_DIRECTORY + TEST_SUBDIR,
target_size=(img_cols, img_rows),
batch_size=1,
class_mode='binary',
shuffle=False)
input_shape=(img_cols, img_rows, channels)
opt = SGD(lr=0.001)
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
verbose=2)
##EVALUATE
print("EVALUATE THE MODEL...")
score = model.evaluate_generator(generator=validation_generator,
steps=nb_validation_samples // batch_size)
The spectograms look like this:
As I said, I tried using different combinations of batch size (16,32,64), SGD with 0.001 learning rate, Adam with 0.0001 learning rate, but for each combination the training loss goes down while the validation loss goes up.

Model seems to be over-fitting. You can try the below approaches to overcome this issue.
If possible try to gather more data or you can use data augmentation techniques to increase the number of samples.
You can use dropout in Keras to reduce the over-fitting. (Looks like you have already added dropout, you can try tuning the values)
Thank you

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Overfitting problem with my validation data - python

Related

Is passing activity_regularizer as argument to Conv2D() the same as passing it seperately right after Conv2D()? (Tensorflow)

How to build a 1D CNN

Validation data performing worse than training data in keras

CNN architecture: classifying "good" and "bad" images

Keras CNN model for image classification does not generalize well

Categories

Resources