Related
I am working with a Sequential Keras model and I trying to figure out the best method for feature scaling.
model = Sequential()
model.add(Masking(mask_value=-50, input_shape=(None,10)))
model.add(LayerNormalization(axis=-1))
model.add(LSTM(100, input_shape=(None,10)))
model.add(Dense(100, activation='relu'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
In line 3, I have a LayerNormalization layer which according to documentation, scales to mean and standard deviation. However, I have also come across Batch normalization and tf.keras.layers.experimental.preprocessing.Normalization. My question is is this method similar to Sklearn's StandardScalar() or is there another method I could use to feature scale within the model?
This should work. It uses an UpSampling layer for a naive 5x5 image-based input:
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()
But you can use the Conv2DTranspose layer too, which combines the UpSampling2D and Conv2D layers into one layer.
A TimeDistributed layer in the case of LSTMs will help. Refer
I am trying to use a CNN architecture to classify text sentences. The architecture of the network is as follows:
text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
pool1 = MaxPooling1D(pool_size=2)(drop21)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
dense = Dense(16, activation='relu')(pool2)
flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)
outputs = Dense(y_train.shape[1], activation='softmax')(out)
model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I have some callbacks as early_stopping and reduceLR to stop the training and to reduce the learning rate when the validation loss is not improving (reducing).
early_stopping = EarlyStopping(monitor='val_loss',
patience=5)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_filepath,
save_weights_only=False,
monitor='val_loss',
mode="auto",
save_best_only=True)
learning_rate_decay = ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=2,
verbose=1,
mode='auto',
min_delta=0.0001,
cooldown=0,
min_lr=0)
Once the model is trained the history of the training goes as follows:
We can observe here that the validation loss is not improving from epoch 5 on and that the training loss is being overfitted with each step.
I will like to know if I'm doing something wrong in the architecture of the CNN? Aren't enough the dropout layers to avoid the overfitting? Which are other ways to reduce overfitting?
Any suggestion?
Thanks in advance.
Edit:
I have tried also with regularization an the result where even worse:
kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)
Edit 2:
I have tried to apply BatchNormalization layers after each convolution and the result is the next one:
norm = BatchNormalization()(conv2)
Edit 3:
After applying the LSTM architecture:
text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)
lstm1 = Bidirectional(LSTM(128, return_sequences = True))(drop22)
lstm2 = Bidirectional(LSTM(64, return_sequences = True))(lstm1)
flat = Flatten()(lstm2)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)
outputs = Dense(y_train.shape[1], activation='softmax')(out)
model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
overfitting can caused by many factors, it happens when your model fits too well to the training set.
To handle it you can do some ways:
Add more data
Use data augmentation
Use architectures that generalize well
Add regularization (mostly dropout, L1/L2 regularization are also possible)
Reduce architecture complexity.
for more clearly you can read in https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d
This is screaming Transfer Learning. google-unversal-sentence-encoder is perfect for this use case. Replace your model with
import tensorflow_hub as hub
import tensorflow_text
text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")
# this next layer might need some tweaking dimension wise, to correctly fit
# X_train in the model
text_input = tf.keras.layers.Lambda(lambda x: tf.squeeze(x))(text_input)
# conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
# drop21 = Dropout(0.5)(conv2)
# pool1 = MaxPooling1D(pool_size=2)(drop21)
# conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
# drop22 = Dropout(0.5)(conv22)
# pool2 = MaxPooling1D(pool_size=2)(drop22)
# 1) you might need `text_input = tf.expand_dims(text_input, axis=0)` here
# 2) If you're classifying English only, you can use the link to the normal `google-universal-sentence-encoder`, not the multilingual one
# 3) both the English and multilingual have a `-large` version. More accurate but slower to train and infer.
embedded = hub.KerasLayer('https://tfhub.dev/google/universal-sentence-encoder-multilingual/3')(text_input)
# this layer seems out of place,
# dense = Dense(16, activation='relu')(embedded)
# you don't need to flatten after a dense layer (in your case) or a backbone (in my case (google-universal-sentence-encoder))
# flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)
outputs = Dense(y_train.shape[1], activation='softmax')(out)
model = Model(inputs=text_input, outputs=outputs)
I think since you are doing a text Classification, adding 1 or 2 LSTM layers might help the network learn better, since it will be able to better associate with the context of the data. I suggest adding the following code before the flatten layer.
lstm1 = Bidirectional(LSTM(128, return_sequence = True))
lstm2 = Bidirectional(LSTM(64))
LSTM layers can help neural network learn association between certain words and might improve the accuracy of your network.
I also Suggest dropping the Max Pooling layers as max pooling especially in text classification can lead the network to drop some of the useful features.
Just keep the convolutional Layers and the dropout. Also remove the Dense layer before flatten and add the aforementioned LSTMs.
It is unclear how you feed the text into your model. I am assuming that you tokenize the text to represent it as a sequence of integers, but do you use any word embedding prior to feeding it into your model? If not, I suggest you to throw atrainable tensorflow Embedding layer at the start of your model. There is a clever technique called Embedding Lookup to speed up its training, but you can save it for later. Try adding this layer to your model. Then your Conv1D layer would have a much easier time working on a sequence of floats. Also, I suggest you throw BatchNormalization after each Conv1D, it should help to speed up convergence and training.
Recently, I have built a simple convolutional neural network for hand gesture image recognition using background subtraction to make the hand a white shape on the screen with a black background. It was built using keras Conv2D for the most part. My dataset has 1000 pics for training and 100 pics for validation and testing. The problem oddly occurs immediately after the first epoch, during which the model's loss goes down a great deal. It usually goes down from some big number like 183 to 1 at the start of the second epoch. All the pics from the dataset are from my own hand using cv2, but I only conducted testing with my own hand, so that should not be any problem. In case the dataset was the problem, I have tried to take 3 different datasets, one using cv2's Canny method, which essentially traces a line of the hand and makes the rest of the pic black to see if that made a difference. Regardless, the same thing continued to happen. Furthermore, I have added multiple Dropout layers in different places to see the effect and the same thing always occurs in which the loss drastically decreases and it shows signs of overfitting. I have also implemented EarlyStopping and multiple layers to see if that helped, but the same results seems to always occurs.
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu',
input_shape = (240, 215, 1)))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(256, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.25))
#model.add(Conv2D(256, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(128, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
#model.add(Conv2D(64, (3,3), activation = 'relu'))
#model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(150, activation = 'relu'))
#model.add(Dropout(0.25))
#model.add(Dense(1000, activation = 'relu'))
model.add(Dropout(0.75))
model.add(Dense(6, activation = 'softmax'))
model.summary()
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['acc'])
callbacks_list = [EarlyStopping(monitor = 'val_loss', patience = 10),
ModelCheckpoint(filepath = 'model.h6', monitor = 'val_loss',
save_best_only = True),]
The commented sections of the code are changes I have tried to implement. I have also varied the Dropout values and positions of them a great deal and nothing significant has changed. Could anyone offer any advice on why my model overfits that quickly?
Yes, it is a clear case of overfitting. Here are my suggestions:
Try reducing the hidden layers
Increase the drop out to 0.5
Create more synthetic images or apply transformations on the raw images.
When dealing with such a massive overfitting phenomenon, a good starting point would be to reduce your number of layers.
Although you add a Dropout after many max-poolings, you still suffer from the overfitting phenomenon.
Here below I present some of my recommendations:
Ensure that you have a comprehensive dataset with clean labels.
Regardless of how we might want to tune the neural network, if the
dataset is not clean, we cannot obtain good results.
Add (for the beginning), maximum 3 stacks of convolution + max_pooling + dropout. (32 + 64 + 128) would be a good starting
point.
Use GlobalAveragePooling2D instead of Dense layers. The latter are not needed in a Convolutional Neural Network, except for
the last layer with sigmoid or softmax.
Try using
SpatialDropout2D. As compared to typical Dropout, which is
applied to each element in the feature map, SpatialDropout drops entire feature maps.
Try to use Data Augmentation. In this way, you create more artificial examples and your network will be less prone to overfitting.
If none of these work, ensure that you use a pre-trained network and you apply transfer learning to your task at hand.
I'm working on a handwritten digit recognition problem, using OpenCV for preprocessing and Keras/Tensorflow for inference. I trained a model on the MNIST handwritten digit dataset, where each image is 28x28 pixels. Now I'm working with a new set of digits and I plan to do further training with the original model architecture and transfer learning via weight initialisation.
So here's my problem: I'm having an issue with losing certain features when I downsize to 28x28 pixels. Here's an example
That's meant to be a two, and the tiny gap in the top loop is important in helping differentiate it from a 9 or an 8. But my preprocessed version loses the gap, so the loop looks closed.
I have posted another question about how to do the downsizing without losing the features. On the other hand, maybe I'd like to downsize to a larger size like 56x56 pixels where I'm less likely to lose such features. How can I set things up such that this new size blends in with the model without rendering the pre-trained weights useless?
Here is the definition of the pre-trained model:
def define_model(learning_rate, momentum):
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', kernel_initializer = 'he_uniform', input_shape=(28,28,1)))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64, (3,3), activation = 'relu', kernel_initializer = 'he_uniform'))
model.add(Conv2D(64, (3,3), activation = 'relu', kernel_initializer = 'he_uniform'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
opt = SGD(lr=learning_rate, momentum=momentum)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
Here's one idea I had: Increase the size of the max-pool kernel after the first layer such that the output of that layer has the same shape as if I used 28x28 pixel images. (but won't that cause me to lose the feature anyway?)
Why don't you upscale the MNST for training? Your question is about the resolution of the image, the MNST dataset was created long ago when GPU memories were still very small. Recent models all have image dimensions bigger than 200 * 200, for example, resnet uses 224*224 as input shape. Since your image is already low resolution from the beginning, and you downscale and you will make the model hard to differentiate each other. Since your model is fairly simple, I would suggest to upscale the training dataset.
And yes, if you use pooling you mentioned, you probably loose information also.
Hope this helps.
One option would be as suggested above - upscale initial dataset from 28x28 to 56x56 for example.
Second option is adding an additional MaxPooling or AveragePooling layer in the beginning of your trained model, for example:
new_input = Input(shape=(56, 56, 1), name='new_input')
x = AveragePooling2D((2,2), name='avg_pool')(new_input)
new_output = trained_model(x)
new_model = Model(new_input, new_output)
Here is a summary of new model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
new_input (InputLayer) (None, 56, 56, 1) 0
_________________________________________________________________
avg_pool (AveragePooling2D) (None, 28, 28, 1) 0
_________________________________________________________________
trained_model (Sequential) (None, 10) 159254
=================================================================
Total params: 159,254
Trainable params: 159,254
Non-trainable params: 0
_________________________________________________________________
I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
Overexposure
Oversaturation
Incorrect white balance
Blurriness
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
Examples:
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Josh
EDIT:
Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)
Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that
with three different images, my results are 0.987 bad and 0.999 good
and
I was giving you the predictions from the model previously
you should use a softmax activation, i.e.
model.add(Dense(classes))
model.add(Activation("softmax"))
Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.
model.add(Dense(1))
model.add(Activation("sigmoid"))
The latter is usually preferred in binary classification settings, but the results should be the same in principle.
UPDATE (after updating the question):
sparse_categorical_crossentropy is not the correct loss here, either.
All in all, try the following changes:
model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])
# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))
with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).
I suggest you go for transfer learning instead of training the whole network.
use the weights trained on a huge Dataset like ImageNet
you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.
and you can train the model as usual way.
Demo code -
from keras.applications import *
from keras.models import Model
base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
layer.trainable = False