Neural Network always predicting the same class - python

I've developed an Image classifier using a convolutional neural network. The whole code is written using Keras. The dataset contains .jpg images with sizes 360X480 and labels 0,1,2 and 3. The data has been balanced so there is the same amount of pictures for each label both in the training and validate datasets.
I've organized my data in directories to use a data generator function that will load the images while the model is training.
The organization of the data is as follows:
Data:
Train: 0: a1.jpg
a2.jpg
...
1: b1.jpg
b2.jpg
...
...
Same for all labels and for test data.
The code used to define and fit the neural network is the following:
model = Sequential()
model.add(Conv2D(128, kernel_size=3, activation='relu', input_shape=input_shape, padding="valid"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, kernel_size=3, activation='relu', padding="valid"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32, kernel_size=3, activation='relu', padding="valid"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dense(4, activation="softmax"))
opt = SGD(lr=learning_rate)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
train_datagen = ImageDataGenerator(
width_shift_range=0.05,
height_shift_range=0.05,
rescale=1./255,
horizontal_flip=True)
validate_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
path_data + '/train',
shuffle = True,
target_size=input_generator,
batch_size=batch_size)
validation_generator = train_datagen.flow_from_directory(
path_data + '/validate',
shuffle = True,
target_size=input_generator,
batch_size=batch_size)
steps_per_epoch = np.ceil(train_size/batch_size)
validation_steps = np.ceil(validation_size/batch_size)
H = model.fit_generator(
train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps)
When I run the network for training, the training accuracy stays around 0.25 and when calculating a confusion matrix on the test dataset, I realize it's predicting everything to be in the same class. What could be happening?
I've tried many different experiments to determine where could the problem come from:
I've changed the network to a VGG architecture, training all the layers, reducing the size of the pictures to 150x150 with different learning rates (0.001, 0.01, 0.1).
I've changed the dataset to be only 16 photos for training (4 for each label) and 5 for validation (1 for each of three labels, and 2 for the fourth label). Using VGG like in the last bullet point, with 100 epochs (trying to force overfitting) the network hasn't learned anything. I even tried to reduce the size of the pictures to be 50x50 and the problem persists.
After this attempts, I've created a dataset of 16 pictures (150x150) for training, 5 for validating where each picture is just a plain color (red, blue, yellow, green, one for each label), no shapes, no images, only plain color. The neural network hasn't been able to learn (in the same configuration than the previous two bullet points).
None of this have solved the problem, the network still predicts everything to be from the same class.
CURRENT STATE
I transformed my data to be classified as binary (merged labels 0, 1 and 2, 3). This first step makes sense since the labels were levels of dirt in a pipe (0: 0-25%, 1: 25-50%, ...).
Used ResNet50 architecture without initial wheights.
Cropped the images to be 224x224.
Results: the output of the model for a few images makes sense (it's not [1,0]), and the accuracy rose along the epochs (at least in the training dataset, meaning that the network is learning).
Next steps will be optimizing the model, and maybe coming back to the initial label classification since it was the original purpose of the project.

Related

CNN model did not learn anything from the training data. Where are the mistakes I made?

The shape of the train/test data is (samples, 256, 256, 1). The training dataset has around 1400 samples, the validation dataset has 150 samples, and the test dataset has 250 samples. Then I build a CNN model for a six-object classification task. However, no matter how hard I tuning the parameters and add/remove layers(conv&dense), I get a chance level of accuracy all the time (around 16.5%). Thus, I would like to know whether I made some deadly mistakes while building the model. Or there is something wrong with the data itself, not the CNN model.
Code:
def build_cnn_model(input_shape, activation='relu'):
model = Sequential()
# 3 Convolution layer with Max polling
model.add(Conv2D(64, (5, 5), activation=activation, padding = 'same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(256, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
# 3 Full connected layer
model.add(Dense(1024, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(512, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(6, activation = 'softmax')) # 6 classes
# summarize the model
print(model.summary())
return model
def compile_and_fit_model(model, X_train, y_train, X_vali, y_vali, batch_size, n_epochs, LR=0.01):
# compile the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# fit the model
history = model.fit(x=X_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
verbose=1,
validation_data=(X_vali, y_vali))
return model, history
I transformed the MEG data my professor recorded into Magnitude Scalogram using CWT. pywt.cwt(data, scales, wavelet) was used. And if I plot the coefficients I got from cwt, I will have a graph like this (I emerged 62 channels into one graph). enter image description here
I used the coefficients as train/test data for the CNN model. However, I tuned the parameters and tried to add/remove layers for the CNN model, and the classification accuracy was unchanged. Thus, I want to know where I made mistakes. Did I make mistakes with building the CNN model, or did I make mistakes with CWT (the way I handled data)?
Please give me some advices, thank you.
How is the accuracy of the training data? If you have a small dataset and the model does not overfit after training for a while, then something is wrong with the model. You can also test with existing datasets, which the model should be able to handle (like Fashion MNIST).
Testing if you handled the data correctly is harder. Did you write unit tests for the different steps in the preprocessing pipeline?

Time series classification using CNN

I am trying to build a convolutional neural network which classifies time series data into two classes. For the time being I only have a small dataset so what I need first is to augment my datasets so I can feed them into a network.
For the data augmentation task, I found some very helpful methods at https://github.com/uchidalab/time_series_augmentation repository. What I have tried so far is to add some gaussian noise to my data, a permutation method, a time warping, a window slice and a window warp methods. These methods are being applied on a (batches, batch_rows, channels)=(354, 400, 3) dataset to generate a (1770, 400, 3) dataset (including train and test datasets and their corresponding labels).
Given the fact that I have a limited number of inputs, I would like to know if you have any suggestions for a 1D CNN structure for a good performance over these datasets.
What I have tried so far is this network:
verbose, epochs, batch_size = 0, 10, 8
n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
model = Sequential()
model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
No matter the changes I make in the parameters and the hyperparameters, I always get an accuracy around 50%, meaning that a binary classifier does not exists.
I would really appreciate if anyone can tell me what probably is the problem. Does this happens due to poor data quality produced by the augmentation methods? Or is it has to do with the network itself?
Thanks in advance
If it's a classification between two classes, you should use binary_crossentropy as loss function.

Vgg16 for gender detection (male,female)

We have used vgg16 and freeze top layers and retrain the last 4 layers on gender dataset 12k male and 12k female. It gives very low accuracy especially for male. We are using the IMDB dataset. On female test data it gives female as output but on male it gives same output.
vgg_conv=VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze the layers except the last 4 layers
for layer in vgg_conv.layers[:-4]:
layer.trainable = False
Create the model
model = models.Sequential()
Add the vgg convolutional base model
model.add(vgg_conv)
Add new layers
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dropout(0.5)) model.add(layers.Dense(2, activation='softmax'))
nTrain=16850 nTest=6667
train_datagen = image.ImageDataGenerator(rescale=1./255)
test_datagen = image.ImageDataGenerator(rescale=1./255)
batch_size = 12 batch_size1 = 12
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=batch_size, class_mode='categorical', shuffle=False)
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224, 224), batch_size=batch_size1, class_mode='categorical', shuffle=False)
model.compile(optimizer=optimizers.RMSprop(lr=1e-6), loss='categorical_crossentropy', metrics=['acc'])
history = model.fit_generator( train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=3, validation_data=test_generator, validation_steps=test_generator.samples/test_generator.batch_size, verbose=1)
model.save('gender.h5')
Testing Code:
model=load_model('age.h5')
img=load_img('9358807_1980-12-28_2010.jpg', target_size=(224,224))
img=img_to_array(img)
img=img.reshape((1,img.shape[0],img.shape[1],img.shape[2]))
img=preprocess_input(img)
yhat=model.predict(img)
print(yhat.size)
label=decode_predictions(yhat)
label=label[0][0]
print('%s(%.2f%%)'% (label[1],label[2]*100))
Firstly, you are saving the model as gender.h5 and during testing you are loading the model age.h5. Probably you have added different code for the testing here.
Coming to improving the accuracy of the program -
Most importantly is that you are using loss = 'categorical_crossentropy', change it to loss = 'binary_crossentropy' in model.compile as you have just 2 classes. So your
model.compile(optimizer="adam",loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy']) will look like this.
Also change class_mode='categorical' to class_mode='binary' in flow_from_directory.
As categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid. So last layer should be Dense(1, activation='sigmoid').
You have added 2 Dense layers of 4096, this will add 4096 * 4096 = ‭16,777,216‬ weights to be learnt by the model. Reduce them may be to 1026 and 512 respectively.
You have added Dropout layer of 0.5, that is to keep the 50% of neurons off during the epoch. That is huge number. Better is to drop off the Dropout layer and use to only if your model is overfitting.
Set batch_size = 1. As you have very less input let every epoch have same number of steps as input records.
Use Data Augmentation technique like horizontal_flip, vertical_flip, shear_range, zoom_range of ImageDataGenerator to generate the new batches of training and validation images during every epoch.
Train your model for large number of epoch. You are just training for epoch=3, that is too less for learning the weights. Train for epoch=50 and later trim the number.
Hope this answers your question. Happy Learning.

Is it possible to classify patches of the image of the same object, but with different areas with CNN?

I have two pictures of a printed object, the first has a printed area of 2.5x2.5 cm^2, and the second is the same object, but the printed area is 5.0x5.0 cm^2. After separating the object from the background and equalizing the histogram of both pictures, I am trying to use small patches (64x64) in a deep learning approach (CNN) to understanding their patterns and classify them. I am trying to use 64x64 patches from the 2.5x2.5cm^2 printed objects to train a deep learning classifier and test them with patches from 5.0x5.0cm^2 objects. The digital images of both objects have approximately the same resolution, as it is defined from the object extractor. Here are examples of the 64x64 patches used to train and test the CNN binary classifier.
64x64 patch of a 2.5x2.5cm^2 object
64x64 patch of a 5x5cm^2 object
The classes I want to predict are the following:
Negative Class (printed for the first time)
Positive class (copied and reprinted)
What I found out:
Patches from 2.5x2.5cm^2 objects are easily classified if the CNN is trained with patches from the same size (area) objects
If the CNN is trained with 64x64 patches from 2.5x2.5cm^2 objects and tested with 64x64 patches from 5x5cm^2 objects, the predictions are just for one class (50% accuracy).
Some multiscale and multiple resolution descriptors work perfectly in this scenario, such as using Bag of Visual Words
Other baseline CNNs also fail in this scenario, such as Mobilenet, Densenet and Resnet
I tried to include zooms in my data augmentation procedure (like suggested by one answer). It did not work either :-(
This is the keras model I have tried so far
model = Sequential()
# GROUP1
model.add(Conv2D(filters=32, kernel_size=3, strides=1, padding='same',
input_shape=input_shape))
model.add(LeakyReLU(alpha=0.2))
# GROUP2
model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(axis=-1, momentum=0.9, epsilon=0.001))
# GROUP3
model.add(Conv2D(filters=64, kernel_size=3, strides=1, padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(axis=-1, momentum=0.9, epsilon=0.001))
# GROUP4
model.add(Conv2D(filters=64, kernel_size=3, strides=2, padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(axis=-1, momentum=0.9, epsilon=0.001))
# GROUP5
model.add(Conv2D(filters=96, kernel_size=3, strides=1, padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(axis=-1, momentum=0.9, epsilon=0.001))
# GROUP6
model.add(Conv2D(filters=96, kernel_size=3, strides=2, padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(axis=-1, momentum=0.9, epsilon=0.001))
model.add(Flatten())
model.add(Dense(1024))
model.add(LeakyReLU(alpha=0.2))
model.add(Dense(2, activation='softmax'))
return model
and here is the data augmentation I am using
datagen = ImageDataGenerator(
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True,
zoom_range=0.2,
fill_mode='nearest')
datagen.fit(x_train)
datagen.fit(x_validation)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=x_train.shape[0] // batch_size,
validation_data=datagen.flow(x_validation, y_validation, batch_size=batch_size),
epochs=nb_epoch, verbose=1, max_q_size=100,
validation_steps=x_validation.shape[0]//batch_size,
callbacks=[lr_reducer, early_stopper, csv_logger, model_checkpoint])
So, is there any solution to increase the accuracy in this very difficult scenario for CNNs? I mean, CNN learns features from data and, as you can see, the training and testing data from the same class are different. So, is that possible for me to perform any data augmentation or CNN operation (my CNN has no dropouts and no pooling, as you can see above) that could minimize or simulate the testing data in the training data?
If your goal is to predict at multiple zoom levels, you need to train the CNN with multiple zoom levels...
I think the current augmentation is generating samples which are not what you want. For example, this is one of the images that could be generated when zoom=1.2:
The simplest solution would be to use a generator like this when training with the 5x5cm^2 patches:
ImageDataGenerator(horizontal_flip=True,
vertical_flip=True,
zoom_range=[0.5, 1])
In that case, when zoom=0.5 you will get an image like this:
which is more or less equivalent to a 2.5x2.5cm^2 image.
If you have to train it using the 2.5x2.5 patches try:
ImageDataGenerator(horizontal_flip=True,
vertical_flip=True,
zoom_range=[1, 2],
fill_mode='constant',
cval=0)
which generate images like this one:
With enough samples and epochs, the CNN should be able to learn that the padding zeros can be ignored.

Deal with large data set for image classification

I am using keras for my image classification , here is my code:
train_generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode="categorical")
validation_generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode="categorical")
Found 70000 images belonging to 15 classes.
Found 6000 images belonging to 15 classes.
Then i am using this data to fit into my model here is my code:
model.fit_generator(
train_generator,
steps_per_epoch=train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_samples// batch_size,)
I have used various batch size, but result are insufficient , my model is too slow to train it takes hours to train and also it crashes sometime , can someone please help to train model when we have large data set , how to do that efficiently ?
Model Code:
# a simple stack of 3 convolution layers with a ReLU activation and followed by max-pooling layers.
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(img_width, img_height,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(15))
model.add(Activation('sigmoid'))
Well, one solution to increase the performance is to use GPU.
If you are running on the TensorFlow backend, your code will automatically run on GPU if any available GPU is detected.
pip install tensorflow-gpu
Also, you can choose to have data parallelism using multi_gpu_model class.
from keras.utils import multi_gpu_model
# Replicates `model` on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)
I see that your output layer uses sigmoid as activation function. That's wrong. You have to use softmax because sigmoid is for binary classification or two-class models.
Softmax function calculates the probabilities distribution of the event over n different class.(15 in your case).
In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.
The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all the probabilities will be equal to 1.

Categories

Resources