I'm doing my first steps into machine learning and trying to do a sign-language machine learning project using the Kaggle dataset. It is supposed to be able to predict characters in ASL. Here's the data presented by Kaggle.
Image of Dataset here.
My current issue is that I can achieve moderate accuracy that fits the data given by Kaggle using their testing data, but if I try to predict a single image, say a random letter of the alphabet, it will be consistently wrong. Here's my code.
from keras.models import Sequential, load_model
from keras.preprocessing.image import load_img, img_to_array
from keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD
import numpy as np
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelBinarizer
import matplotlib.pyplot as plt
trainer = read_csv("sign_mnist_train.csv")
labels = trainer["label"].values
trainer = trainer.drop(["label"], axis=1) #
tester = read_csv("sign_mnist_test.csv")
testlabels = tester["label"].values
tester = tester.drop(["label"], axis=1)
def preProcessing(raw, classes):
OH = OneHotEncoder(sparse=False) # One hot's the labels, can be replaced with LabelBinarizer
binary = classes.reshape(len(classes), 1)
binary = OH.fit_transform(binary)
images = raw.values
for c, i in enumerate(images, 0):
image = np.reshape(i, (28, 28))
image = image.flatten()
images[c] = np.array(image)
return images, binary
def defineModel(): # Builds the layers for our model
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=(x_test.shape[1:]), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(y_train.shape[1], activation='softmax'))
opt = SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["categorical_accuracy"])
return model
def testModel(): #Test's a single image, predicting the class.
model = load_model("my_model.hl5")
img = load_img("C.jpg", color_mode="grayscale", target_size=(28, 28))
img = img_to_array(img)
img = np.reshape(img, (-1, 28, 28, 1))
test = model.predict_classes(img)
print(test)
test_test = model.predict_proba(img)[0]
test_test = "%.2f" % (test_test[test]*100)
print(test_test)
if __name__ == "__main__":
data, labels = preProcessing(trainer, labels)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.33, random_state=42)
x_train = x_train.astype('float32')
x_train = x_train/255.0
x_train = np.reshape(x_train, (x_train.shape[0], 28, 28, 1))
x_test = x_test.astype('float32')
x_test = x_test/255.0
x_test = np.reshape(x_test, (x_test.shape[0], 28, 28, 1))
model = defineModel()
history = model.fit(x_train, y_train, validation_data = (x_test, y_test), epochs=40, verbose=1, batch_size=128)
model.evaluate(x_test, y_test)
model.save("my_model.hl5")
Apologies for the messy code, but essentially I try to break the data into usable parts using Panda, then using Keras/Sklearn to fit the data. I wanted to look deeper and was advised to use accuracy_score in the Sklearn library.
testStuff, testlabels = preProcessing(tester, testlabels)
testStuff = testStuff.reshape(testStuff.shape[0], 28, 28, 1)
pred = model.predict(testStuff).round()
print(accuracy_score(testlabels, pred))
This showed that my accuracy was only around 70% compared to the 99% model.evaluate posed. Regardless, I still a very low accuracy on random predictions, some of my individual tests were snipped straight from the Kaggle example images. From there, I tried removing layers, increasing/reducing filters on the Conv2d layers to see what happens, but nothing seems to make a difference. I picked up Pyplot to display the graph and I get this. I don't see a problematic trend, but I may be looking in the wrong area.
Is it because of overfitting/underfitting? I feel that I am getting something wrong at a fundamental level and could use some tips. Looking at similar questions, they point toward possible indexing issues and otherwise mismanagement of the dataset, I am unsure how to test if these issues are present in my code. This is my first time using StackOverflow to ask a question so feel free to ask anything since I understand that reading my rambling code/question is confusing.
Summary: Okay accuracy, bad predictions, why?
In general this behaviour often occurs due to overfitting:
Try to tweak your network to have fewer parameters and try to add some regularizations.
Further it could be that your test set only contains a part of the planned real world domain. Meaning that your training set is far away from reality, which also could lead to bad predictions.
A way to tweak your dataset could be data augmentation, I assume it could work very well on this ASL DataSet - but I did not had a deep look.
Data Augmentation is basically an artifical way to increase the size of your dataset, reducing overfitting as well and improves on slight rotations of your hand or other "random" distortions, like different background or different clothing.
A great article about data augmentation can be found here:
https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9
Related
I have trained a model with Keras and Tensorflow and generated an .h5 file where the model is saved. Here is my code (I have only included the model and not the data processing snippet so that it will be more readable):
ts = 0.3 # Percentage of images that we want to use for testing. The rest is used for training.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=ts, random_state=42)
# Import of keras model and hidden layers for our convolutional network
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten
from tensorflow.python.client import device_lib
from keras import backend as K
# Construction of model
model = Sequential()
model.add(Conv2D(32, (5, 5), activation='relu', input_shape=(120, 320, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Configures the model for training
model.compile(optimizer='adam', # Optimization routine, which tells the computer how to adjust the parameter values to minimize the loss function.
loss='sparse_categorical_crossentropy', # Loss function, which tells us how bad our predictions are.
metrics=['accuracy']) # List of metrics to be evaluated by the model during training and testing.
# Trains the model for a given number of epochs (iterations on a dataset) and validates it.
model.fit(X_train, y_train, epochs=5, batch_size=64, verbose=2, validation_data=(X_test, y_test))
# Save entire model to a HDF5 file
model.save('handrecognition_model.h5')
I have trained my model on this data set which contains photos of hand gestures. Now I have a photo which I have taken with laptop camera, lets say it is called "thumbs_up.jpg" and it contains a picture of me doing a thumbs up. I wanna put this picture on the model I trained to see if it will predict it correctly. I know I am probably missing something extremently basic here, but how can I do this? Should I use the .h5 file somehow? Sorry if my question is super basic/obvious I am just legit confused and dont know what to do. Thanks in advance
I think what you are looking for is the predict() function built in to tensorflow. You might also use the evaluate() function to evaluate your model on multiple test images. Here is the guide that explains how it is done: https://www.tensorflow.org/guide/keras/train_and_evaluate?hl=en
# Evaluate the model on the test data using `evaluate`
print("Evaluate on test data")
results = model.evaluate(x_test, y_test, batch_size=128)
print("test loss, test acc:", results)
# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print("Generate predictions for 3 samples")
predictions = model.predict(x_test[:3])
print("predictions shape:", predictions.shape)
You don't even have to save your model. Just load the image from the path where it is saved.
The predict function takes in a numpy array. You can use the cv2 library to read an image as a numpy array. This is the process I use:
# read image
image = cv2.imread(image_path)
# resize image
dimensions = (img_height, img_width)
image = cv2.resize(image, dimensions)
# change color channels form bgr to rgb
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# normalise data
image = (image/255).astype(np.float16)
# add image to array
image_data = np.concatenate(image)
You just add model.predict(yourList) and that will predict values for whatever you need to put in
I am new in ML field and learning it, I made a model by following a tutorial but resulted accuracy is always jumps to 100% soon. I searched online and find about it that i have issue related to model overfitting according to my understanding. Dataset i have used is pretty small from UCI site named Indian Liver Patients Dataset. The dataset contains very few observation around 600.
My Question is how i could overcome this overfitting in the data. Any Help will be appreciated, Thanks.
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
df = pd.read_csv("C:/TF/TEST/ILDP.csv")
df["ag_ratio"].fillna("0.6", inplace=True)
df.isnull().sum()
print(df.head())
LD, NLD = df['is_patient'].value_counts()
df_sex = pd.get_dummies(df['gender'])
df_new = pd.concat([df, df_sex], axis=1)
Droop_gender = df_new.drop(labels=['gender'], axis=1)
Droop_gender.columns = ['age', 'tot_bilirubin', 'direct_bilirubin', 'tot_proteins', 'albumin', 'ag_ratio',
'sgpt', 'sgot', 'alkphos', 'Female', 'Male', 'is_patient']
X = Droop_gender.drop('is_patient', axis=1)
y = Droop_gender['is_patient']
print(X.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = Sequential() # Initialising the ANN
classifier.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=11))
classifier.add(Dense(units=8, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
# compile ANN
classifier.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
# Fitting the data
histroy = classifier.fit(X_train, y_train, batch_size=20, epochs=50)
y_pred = classifier.predict(X_test)
y_pred = [1 if y >= 0.5 else 0 for y in y_pred]
print(classification_report(y_test, y_pred))
That your model is overfitting is encouraging because it means your model has the capacity to learn. Now you have to gradually reduce the capacity of your model to make it generalize better. My recommendation is to add regularization.
Add dropout layers between some of your fully connected layers:
classifier.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=11))
classifier.add(keras.layers.Dropout(0.5))
classifier.add(Dense(units=8, kernel_initializer='uniform', activation='relu'))
You can add these dropout layers between any layers, but adding between layers with more neurons is better.
If that doesn't work well you can try weight decay. Here is an example from the documentation:
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Although try either kernel_regularize or activity_regularizer first. They should both work about the same anyway. Try tuning and see how different parameters change. In the end it's a lot of black magic so you'll have to experiment a bit. Good luck!
I have an autoencoder and I checked the accuracy of my model with different solutions like changing the number of conv layer and increase them, add or remove Batch Normalization, change the activation function, but the accuracy for all of them is similar and it does not have any improvement that is weird. I'm confused because I think it the accuracy for these different solutions should be different but it is 0.8156. could you please help me what is the problem? I also train it with 10000 epochs but the output is the same for 50 epochs! do my code is wrong or it can not become better?!
the accuracy graph
I am also not sure the learning rate decay work or not?!
I put my code here too:
from keras.layers import Input, Concatenate, GaussianNoise,Dropout,BatchNormalization
from keras.layers import Conv2D
from keras.models import Model
from keras.datasets import mnist,cifar10
from keras.callbacks import TensorBoard
from keras import backend as K
from keras import layers
import matplotlib.pyplot as plt
import tensorflow as tf
import keras as Kr
from keras.callbacks import ReduceLROnPlateau
from keras.callbacks import EarlyStopping
import numpy as np
import pylab as pl
import matplotlib.cm as cm
import keract
from matplotlib import pyplot
from keras import optimizers
from keras import regularizers
from tensorflow.python.keras.layers import Lambda;
image = Input((28, 28, 1))
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
conv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2e')(conv1)
conv3 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl3e')(conv2)
#conv3 = Conv2D(8, (3, 3), activation='relu', padding='same', name='convl3e', kernel_initializer='Orthogonal',bias_initializer='glorot_uniform')(conv2)
BN=BatchNormalization()(conv3)
#DrO1=Dropout(0.25,name='Dro1')(conv3)
DrO1=Dropout(0.25,name='Dro1')(BN)
encoded = Conv2D(1, (3, 3), activation='elu', padding='same',name='encoded_I')(DrO1)
#-----------------------decoder------------------------------------------------
#------------------------------------------------------------------------------
deconv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1d')(encoded)
deconv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2d')(deconv1)
deconv3 = Conv2D(16, (3, 3), activation='elu',padding='same', name='convl3d')(deconv2)
BNd=BatchNormalization()(deconv3)
DrO2=Dropout(0.25,name='DrO2')(BNd)
#DrO2=Dropout(0.5,name='DrO2')(deconv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same', name='decoder_output')(DrO2)
#model=Model(inputs=[image,wtm],outputs=decoded)
#--------------------------------adding noise----------------------------------
#decoded_noise = GaussianNoise(0.5)(decoded)
watermark_extraction=Model(inputs=image,outputs=decoded)
watermark_extraction.summary()
#----------------------training the model--------------------------------------
#------------------------------------------------------------------------------
#----------------------Data preparation----------------------------------------
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#(x_train, _), (x_test, _) = cifar10.load_data()
#x_validation=x_train[1:10000,:,:]
#x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))
#---------------------compile and train the model------------------------------
# is accuracy sensible metric for this model?
learning_rate = 0.1
decay_rate = learning_rate / 50
opt = optimizers.SGD(lr=learning_rate, momentum=0.9, decay=decay_rate, nesterov=False)
watermark_extraction.compile(optimizer=opt, loss=['mse'], metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)
#rlrp = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_delta=1E-7, verbose=1)
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/output of tensorboard', histogram_freq=0, write_graph=False),es])
watermark_extraction.summary()
#--------------------visuallize the output layers------------------------------
#_, train_acc = watermark_extraction.evaluate(x_train, x_train)
#_, test_acc = watermark_extraction.evaluate([x_test[5000:5001],wt_expand], [x_test[5000:5001],wt_expand])
#print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
## plot loss learning curves
pyplot.subplot(211)
pyplot.title('MSE Loss', pad=-40)
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.subplot(212)
pyplot.title('Accuracy', pad=-40)
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show
Since you stated that you are a beginner, I am going to try to build from the bottom up and try to rope in your code with that explanation as much as possible.
Part 1 Autoencoders are made up of two parts (Encoders and Decoders). Autoencoders decrease the number of variables required to store the Information, and Decoders try to get this information back from the compressed form. (Note that autoencoders are not used in real data compression tasks, due to their uncertainty and their data dependent nature).
Now In your code you keep the padding as same.
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
This basically takes away the compression and expansion feature of the autoencoders, i.e. in each step you are using the same number of variables to represent the information.
Part 2 Now moving on to you training the algorithm
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/PhD/thesis/deepwatermark/journal code/autoencoder_watermark/11-2-2019/output of tensorboard', histogram_freq=0, write_graph=False),es])
From this expression/statement/line of code I come to the conclusion that you want to generate back the same Image that you put in your code, Now since the Image is stored in same number of variables your model just has to pass on the same Image to each step without changing anything in the Image, this incentivizes your model to optimize each Filter Parameter to 1.
Part 3 Now comes the biggest nail in the coffin, you have Implemented a dropout layer, first you should NEVER Implement dropout in the convolutional layer. This Link explains why and It discusses various ideas that I think if you are a beginner you should check out. Now let's see why the way you have used Dropout is really bad. As already explained, the best fit for your model would be all parameters in the filters learning the value 1. Now what happens here is that you have forced some of those Filters to turn off, which does nothing other than switching off some filters as discussed in the article, All this does is decreases the Intensity of your Images in the next layer.(Since CNN filters take average over all Input channels)
DrO2=Dropout(0.25,name='DrO2')(BNd)
Part 4 This is just a bit of advice and not something that will be a source of any problem
BNd=BatchNormalization()(deconv3)
Here you have tried to normalize the data over the batch, Data normalization is extremely important in most cases, as you might know it doesn't let one feature dictate the model and each feature get an equal say in the model, but in Image data every point is already scaled between zero and 255, so using normalization to scale it between 0 and 1 adds no value, just adds unnecessary computation to the model.
I would suggest you to understand part by part and if something is unclear, comment below, try not to make this about autoencoders using CNN (They don't have any real application anyway), but rather use it to understand various Intricacies of ConvNets(CNN), The reason I have chosen to write an answer like this explaining parts of your network and not the code is because the code for what you are looking for is just a google search away, If you are intrigued by this answer and want to know how exactly CNN's work, check this out https://www.youtube.com/watch?v=ArPaAX_PhIs&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF, If you have any doubts and on anything in this answer or even on those videos comment below.
I've been using Keras with Tensorflow to classify a normalized 60x60 grayscale image of an arrow into 4 categories, its orient, up, down, left, right. I have created a dataset of about ~1800 images, almost equally distributed into said categories.
However, there's a problem with classification. From the source where I have created the dataset, there are two types of arrows, arrow shape 1,
and arrow shape 2.
The accuracy is okay for arrows which are shaped like 1(about ~70% validation accuracy), but for arrows like number 2, terrible.
I've went through my dataset, and about 90% of the dataset images are arrow shape 1.
Does that mean that the lack of traning data for arrow shape 2 is the reason that it cannot classify them as well as shape 1, and therefore increasing the dataset for shape 2 resolve this issue?
If true, doesn't that mean that my model has failed to generalize?
Also, if the arrow colors are inverted, will the network be affected by this?
Here is the source I'm using to train data:
# -*- coding:utf-8 -*-
import cv2
import numpy as np
import os
from random import shuffle
import glob
train_dir = "images\\cropped\\traindata"
test_dir = "images\\cropped\\testdata"
MODEL_NAME = "ARROWS.model"
img_size = 60
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Activation
from keras.layers import BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import adam
from keras.callbacks import TensorBoard
from keras import backend as K
from tensorflow import Session, ConfigProto, GPUOptions
gpuoptions = GPUOptions(allow_growth=True)
session = Session(config=ConfigProto(gpu_options=gpuoptions))
K.set_session(session)
classifier = Sequential()
classifier.add(Conv2D(32, (3,3), input_shape=(img_size, img_size, 1)))
classifier.add(BatchNormalization())
classifier.add(Activation("relu"))
classifier.add(Conv2D(32, (3,3)))
classifier.add(BatchNormalization())
classifier.add(Activation("relu"))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dropout(0.25))
#classifier.add(Dropout(0.25))
classifier.add(Conv2D(64, (3,3), padding='same'))
classifier.add(BatchNormalization())
classifier.add(Activation("relu"))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dropout(0.25))
#classifier.add(Dropout(0.25))
classifier.add(Flatten())
classifier.add(Dense(128))
classifier.add(BatchNormalization())
classifier.add(Activation("relu"))
classifier.add(Dropout(0.5))
classifier.add(Dense(4))
classifier.add(BatchNormalization())
classifier.add(Activation("softmax"))
classifier.compile(optimizer = adam(lr=1e-6), loss = 'categorical_crossentropy', metrics = ['accuracy'])
train_datagen = ImageDataGenerator(rotation_range=12)
test_datagen = ImageDataGenerator(rotation_range=12)
training_set = train_datagen.flow_from_directory('images/cropped/traindata',
color_mode="grayscale",
target_size = (img_size, img_size),
batch_size = 32,
class_mode = 'categorical', shuffle=True)
test_set = test_datagen.flow_from_directory('images/cropped/testdata',
color_mode="grayscale",
target_size = (img_size, img_size),
batch_size = 32,
class_mode = 'categorical', shuffle=True)
with open("class_indices.txt", "w") as indices_fine: # Log debug data to file
indices_fine.write(str(classifier.summary()))
indices_fine.write("\n")
indices_fine.write("training_set indices:\n"+str(training_set.class_indices))
indices_fine.write("test_set indices:\n"+str(test_set.class_indices))
tbCallBack = TensorBoard(log_dir='./log', histogram_freq=0, write_graph=True, write_images=True)
classifier.fit_generator(training_set,steps_per_epoch = 8000,epochs = 15,validation_data = test_set,validation_steps = 2000, shuffle=True, callbacks=[tbCallBack])
classifier.save("arrow_classifier_keras_gray.h5")
Does that mean that the lack of traning data for arrow shape 2 is the
reason that it cannot classify them as well as shape 1, and therefore
increasing the dataset for shape 2 resolve this issue?
Your dataset distribution is very important and can cause bias toward a particular class and not performing as you expect. In your case, the number of cases of shape 2 is much smaller than your shape 1, hence creating bias in your Deep Learning model to somehow assuming all of the down arrows must be like shape 1, and not shape 2. Solution? You already know the answer: Increasing the dataset for shape 2 or make shape 1 and shape 2 equally distributed in down arrow class.
If true, doesn't that mean that my model has failed to generalize?
Your dataset distribution of images caused the model to fail generalizing well on that particular class (down arrow). If your model works well on other classes, the problem is not your model, but your dataset for down arrow class.
Just imagine your first image is a tiny cat and a second is a fat cat, like Garfield. The distribution of cats is something we cannot change, but we need to detect all the cats (even when the cats are inverted or sprayed in pink).
What I would do is for instance if I have 1000 of tiny cats, I may filter some of the images add some distortions, and effect to make the training set bigger. This is called data augmentation.
You don't need per purpose to make the number of fat cat images equal the number of tiny cats images if at the end you recognize them all well and you have trained your image classifier with let's say ~98% accuracy on this dataset.
It is important to test.
NOTE: CNN should be good at detecting images with inverted colors. It is because they use the convolution technique.
I am pretty new to machine learning so I am playing around with examples and such.
The image size specified in the code is (28,28)
But for some reason I keep getting the same ValueError
I cant figure out why this is happening.
Here's the code:
import pandas as pd
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils import np_utils
# input image dimensions
img_rows, img_cols = 28, 28
batch_size = 128 # Number of images used in each optimization step
nb_classes = 10 # One class per digit
nb_epoch = 35 # Number of times the whole data is used to learn
# Read the train and test datasets
train = pd.read_csv("../input/train.csv").values
test = pd.read_csv("../input/test.csv").values
# Reshape the data to be used by a Theano CNN. Shape is
# (nb_of_samples, nb_of_color_channels, img_width, img_heigh)
X_train = train[:, 1:].reshape(train.shape[0], 1, img_rows, img_cols)
X_test = test.reshape(test.shape[0], 1, img_rows, img_cols)
y_train = train[:, 0] # First data is label (already removed from X_train)
# Make the value floats in [0;1] instead of int in [0;255]
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# convert class vectors to binary class matrices (ie one-hot vectors)
Y_train = np_utils.to_categorical(y_train, nb_classes)
#Display the shapes to check if everything's ok
print('X_train shape:', X_train.shape)
print('Y_train shape:', Y_train.shape)
print('X_test shape:', X_test.shape)
model = Sequential()
# For an explanation on conv layers see http://cs231n.github.io/convolutional-networks/#conv
# By default the stride/subsample is 1
# border_mode "valid" means no zero-padding.
# If you want zero-padding add a ZeroPadding layer or, if stride is 1 use border_mode="same"
model.add(Convolution2D(12, 5, 5, border_mode='valid',input_shape=(1,img_rows, img_cols)))
model.add(Activation('relu'))
# For an explanation on pooling layers see http://cs231n.github.io/convolutional-networks/#pool
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.15))
model.add(Convolution2D(24, 5, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.15))
# Flatten the 3D output to 1D tensor for a fully connected layer to accept the input
model.add(Flatten())
model.add(Dense(180))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes)) #Last layer with one output per class
model.add(Activation('softmax')) #We want a score simlar to a probability for each class
# The function to optimize is the cross entropy between the true label and the output (softmax) of the model
# We will use adadelta to do the gradient descent see http://cs231n.github.io/neural-networks-3/#ada
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=["accuracy"])
# Make the model learn
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)
# Predict the label for X_test
yPred = model.predict_classes(X_test)
# Save prediction in file for Kaggle submission
np.savetxt('mnist-pred.csv', np.c_[range(1,len(yPred)+1),yPred], delimiter=',', header = 'ImageId,Label', comments = '', fmt='%d')
So the problem is with the convolution sizes used. Convolution operations usually reduce dimension of the image. Similarly - each pooling operation reduces the size. You have very small images yet applied model architecture which has been designed for a bigger ones, thus at some point, after one of the convolutions/poolings you actually have a smaller outputed image than a following filter size, and this is an ill-defined operation.
To temporarly fix the problem - remove second convolution and maxpooling layers, since these operations (with parameters provided) cannot be performed on such small data. In general you should first understand how convolution works, and not apply someone elses model, since the parameters are crucial for good performance - if you apply transformations which reduce resolution to much - you will be unable to learn anything. Thus once you have some intuition how convolution works you can go back and try different architectures, but there is no one, "magical" equation to figure out the architecture, thus I cannot provide you with parameters that will "just work" - start with removing this additional convolution and pooling, and than go back and try other possibilities once you have better understanding of your data and model.