I'm doing a binary classification with CNN using Matconvnet on Matconvnet. And now, I'm trying to realize it through Keras on Python. The network is not complex at all and I achieved 96% accuracy on Matconvnet. However with Keras, even I tried my best to ensure every setting is exactly the same with before, I can't get the same result. Or event worse, the model doesn't work at all.
Here are some details about the setting. Any ideas or help is appreciated!
Input
The images are 20*20 size. Training size is 400 and testing size 100, validation size 132.
Matconvnet: images stored in 20*20*sample_size method
Keras: images stored in sample_size*20*20*1 method
CNN Structure
(3*3)*3 conv- (2*2) maxpooling- fully connected- softmax- logloss
Matconvnet: Use convolutionized layer instead of fully connected one. Here is the code:
function net = initializeCNNA()
f=1/100 ;
net.layers = {} ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{f*randn(3,3,1,3, 'single'), zeros(1, 3, 'single')}}, ...
'stride', 1, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'pool', ...
'method', 'max', ...
'pool', [2 2], ...
'stride', 2, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{f*randn(9,9,3,2, 'single'), zeros(1,2,'single')}}, ...
'stride', 1, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'softmaxloss') ;
net = vl_simplenn_tidy(net) ;
Keras:
model = Sequential()
model.add(Conv2D(3, (3,3),kernel_initializer=\
keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))
model.add(Flatten())
model.add(Dense(2,activation='softmax',\
kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None)))
Loss Function
Matconvnet: softmaxloss
Keras: binary_crossentropy
Optimizer
Matconvnet: SGD
trainOpts.batchSize = 50;
trainOpts.numEpochs = 20 ;
trainOpts.learningRate = 0.001 ;
trainOpts.weightDecay = 0.0005 ;
trainOpts.momentum = 0.9 ;
Keras: SGD
sgd = optimizers.SGD(lr=0.001, momentum=0.9, decay=0.0005)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
Initialization: filters:N(0,0.1), bias: 0
normalization: no batch normalization except normalization while input to have 0 mean and 1 std for images.
Above are the aspects I reviewed to make sure I did the correct replication. Yet I don't understand why it doesn't work on Keras. Here are some guesses:
Matconvnet uses a convolutionized layer instead of fully connected layer and may imply some fancy way to update the parameters.
They use a different algorithm to apply SGD whose parameters have different meaning.
I also did other tries:
Change optimizer in Keras into Adadelta(). No improvement.
Change network structure and make it deeper. It works!
But still want to know why Matconvnet can achieve that good result with a much simpler one.
"Matconvnet uses a convolutionized layer instead of fully connected layer and may imply some fancy way to update the parameters."
No. Technically, there should be no difference between convolution and fully connected layers. I'm pretty sure there's no fancy way to update the parameters.
More comments coming..
some of the discussion in this post may help:
Can't replicate a matconvnet CNN architecture in Keras
Related
Firstly, I know that similar questions have been asked before, but mainly for classification problems. Mine is a regression-style problem.
I am trying to train a neural network using keras to evaluate chess positions using stockfish evaluations. The input is boards in a (12,8,8) array (representing piece placement for each individual piece) and output is the evaluation in pawns. When training, the loss stagnates at around 500,000-600,000. I have a little over 12 million boards + evaluations and I train on all the data at once. The loss function is MSE.
This is my current code:
model = Sequential()
model.add(Dense(16, activation = "relu", input_shape = (12, 8, 8)))
model.add(Dropout(0.2))
model.add(Dense(16, activation = "relu"))
model.add(Dense(10, activation = "relu"))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1, activation = "linear"))
model.compile(optimizer = "adam", loss = "mean_squared_error", metrics = ["mse"])
model.summary()
# model = load_model("model.h5")
boards = np.load("boards.npy")
evals = np.load("evals.npy")
perf = model.fit(boards, evals, epochs = 10).history
model.save("model.h5")
plt.figure(dpi = 600)
plt.title("Loss")
plt.plot(perf["loss"])
plt.show()
This is the output of a previous epoch:
145856/398997 [=========>....................] - ETA: 26:23 - loss: 593797.4375 - mse: 593797.4375
The loss will remain at 570,000-580,000 upon further fitting, which is not ideal. The loss should decrease by a few more orders of magnitude if I am not wrong.
What is the problem and how can I fix it to make the model learn better?
I would suspect that your evaluation data contains very big values, like 100000 pawns if one of sides forcefully wins. Than, if your model predicts something like 0 in the same position, then squared error is very high and this pushes MSE high as well. You might want to check your evaluation data and ensure they are in some limited range like [-20..20].
Furthermore, evaluating a chess position is a very complex problem. It looks like your model has too few parameters for the task. Possible improvements:
Increase the numbers of neurons in your dense layers (say to 300,
200, 100).
Increase the numbers of hidden layers (say to 10).
Use convolutional layers.
Besides this, you might want to create a simple "baseline model" to better evaluate the performance of your neural network. This baseline model could be just a python function, which runs on input data and does position evaluation based on material counting (like bishop - 3 pawns, rook - 5 etc.) Than you can run this function on your dataset and see MSE for it. If your neural network produces a smaller MSE than this baseline model, than it is really learning some useful patterns.
I also recommend the following book: "Neural Networks For Chess: The magic of deep and reinforcement learning revealed" by Dominik Klein. The book contains a description of network architecture used in AlphaZero chess engine and a neural network used in Stockfish.
I have the following Neural Network:
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=X_train.shape[1:]))
model.add(Dropout(0.2))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(150, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10))
model.add(Dropout(0.3))
model.add(Dense(2, activation='softmax')) # Activation_layer
opt = Adam(lr=1e-3, decay=1e-6)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
The network will be fed sequential data, and is trying to classify the data to either 1 or 0.
Example of one of the samples:
X:
[[0.56450562 0.69825955 0.57768099 0.69077864]
[0.58818427 0.70355375 0.61725885 0.30270281]
[0.57407927 0.72501532 0.59603936 0.29196058]
[0.56501804 0.69072662 0.59064673 0.66034622]
[0.56552001 0.70354009 0.59136487 0.1586415 ]
[0.56501496 0.68205159 0.57877241 0.62252169]
[0.54535762 0.67067675 0.58414928 0.9077868 ]
[0.56197241 0.71226839 0.5920788 0.1339519 ]
[0.57308813 0.70469134 0.59749238 0.27085101]
[0.56146488 0.69258436 0.58377929 0.7065891 ]
[0.55943607 0.69106406 0.59569036 0.69378783]
[0.5670203 0.68271571 0.58702014 0.70585781]
[0.58320254 0.71228948 0.60867704 0.19280208]
[0.56904526 0.71490986 0.59027546 0.35757948]
[0.56398908 0.67858148 0.58197139 0.75064535]
[0.57005691 0.7062191 0.60363236 0.38345417]
[0.5705625 0.70394121 0.58630169 0.19171352]
[0.56145905 0.69106039 0.58340288 0.76821359]
[0.55183665 0.68991404 0.5935228 0.53419864]
[0.56549613 0.68800419 0.58013082 0.74470123]
[0.54926442 0.67315638 0.58336904 0.77819332]
[0.56802882 0.71842805 0.60222782 0.12845991]
[0.59591035 0.70927878 0.61161172 0.68023463]
[0.56904526 0.713053 0.58773435 0.20017562]
[0.58321778 0.69939555 0.61194041 0.47063807]
[0.57814777 0.71113559 0.58991151 0.62149082]
[0.56044844 0.69257776 0.58738045 0.39285414]
[0.56853912 0.70091102 0.59713724 0.21938703]
[0.56398364 0.69939514 0.59316136 0.43031303]
[0.56701957 0.69901619 0.5935228 0.39333831]
[0.56701916 0.68082684 0.58701647 0.84346823]
[0.57765044 0.70812209 0.60147335 0.38961049]
[0.58975543 0.71340576 0.6050683 0.61008348]
[0.57207508 0.70280098 0.59821004 0.44573693]
[0.56702537 0.71035313 0.59424384 0.30333905]
[0.58417429 0.69901619 0.60288387 0.7210835 ]
[0.56400225 0.70128289 0.59028243 0.42721302]
[0.5725759 0.70241467 0.60000056 0.22784863]
[0.57055816 0.69561772 0.59136355 0.66855609]
[0.58766922 0.70995564 0.60538235 0.71163122]
[0.57206444 0.69788453 0.59567842 0.707679 ]
[0.5775922 0.70956495 0.60249313 0.32745877]
[0.57407031 0.6997696 0.57952909 0.54327415]
[0.55346759 0.69223554 0.58920848 0.27867972]
[0.58612784 0.7031614 0.617901 0.76338596]
[0.58659902 0.72005896 0.60604811 0.48696192]
[0.57004823 0.70539865 0.59173347 0.47288217]
[0.57405756 0.7023936 0.59030119 0.49981083]
[0.55801818 0.68813345 0.58564415 0.38486918]
[0.55900944 0.69300306 0.58527681 0.41875207]
[0.56351994 0.68585174 0.58239563 0.70965566]
[0.5509523 0.69524821 0.59280378 0.46280846]
[0.56753474 0.69713124 0.59172507 0.29915786]
[0.56753451 0.69939326 0.5978358 0.59996518]
[0.56954889 0.69109776 0.57734904 0.27905973]
[0.55595081 0.68429475 0.59424321 0.86881108]
[0.57005376 0.71486763 0.60215717 0.20096972]
[0.57509255 0.70467308 0.59028491 0.29196681]
[0.5584625 0.68958804 0.59028342 0.24039387]
[0.57005412 0.70203582 0.5964024 0.59344888]]
y:
1
The issue I am having, is that the loss starts out at around 0.69 and never decreases significantly (it fluctuates a bit), and the loss and validtation loss both stay around 0.5
What I've tried so far:
Checked Training and validation data for NaN's or values < 0 or > 1 -> Not found
Reduced sample size dramatically (down to 50 samples) and a network that should be more than large enough to overfit, but alas still the same result.
Preprocessed the data in a completely different way
Using sigmoid activation instead of softmax to classify the labels.
Reduced learning rate
Removing the second last dense layer
Used LeakyReLU with alpha=0.05
Although the data could be next to random, shouldn't a sufficiently large network easily overfit onto 50 samples or less?
2 suggestions:
It appears you have a binary classification problem (either 0 or 1), perhaps you could try a binary cross entropy loss instead?
Are you using a method such as to_categorical to one-hot encode your labels?
Other factors that can sometimes dramatically affect accuracies that you haven't mentioned trying/changing:
Using different optimizers
Exploring different architectures: have you considered maybe a CNN-LSTM model? Or have you tested on different architectures, do some learn better than others?
I'm learning the simplest neural networks using Dense layers using Keras. I'm trying to implement face recognition on a relatively small dataset (In total ~250 images with 50 images per class).
I've downloaded the images from google images and resized them to 100 * 100 png files. Then I've read those files into a numpy array and also created a one hot label array for training my model.
Here is my code for processing the training data:
X, Y = [], []
feature_map = {
'Alia Bhatt': 0,
'Dipika Padukon': 1,
'Shahrukh khan': 2,
'amitabh bachchan': 3,
'ayushmann khurrana': 4
}
for each_dir in os.listdir('.'):
if os.path.isdir(each_dir):
for each_file in os.listdir(each_dir):
X.append(cv2.imread(os.path.join(each_dir, each_file), -1).reshape(1, -1))
Y.append(feature_map[os.path.basename(each_file).split('-')[0]])
X = np.squeeze(X)
X = X / 255.0 # normalize the training data
Y = np.array(Y)
Y = np.eye(5)[Y]
print (X.shape)
print (Y.shape)
This is printing (244, 40000) and (244, 5). Here is my model:
model = Sequential()
model.add(Dense(8000, input_dim = 40000, activation = 'relu'))
model.add(Dense(1200, activation = 'relu'))
model.add(Dense(700, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(5, activation = 'softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=25, batch_size=15)
When I train the model, It stuck at the accuracy 0.2172, which is almost the same as random predictions (0.20).
I've also tried to train mode with grayscale images but still not getting expected accuracy. Also tried with different network architectures by changing the number of hidden layers and neurons in hidden layers.
What am I missing here? Is my dataset too small? or am I missing any other technical detail?
For more details of code, here is my notebook: https://colab.research.google.com/drive/1hSVirKYO5NFH3VWtXfr1h6y0sxHjI5Ey
Two suggestions I can make:
Your data set is probably too small. If you are splitting training and validation at 80/20, that means you are only training on 200 images, which is probably too small. Try increasing your data set to see if results improve.
I would recommend adding Dropout to each layer of your network as your training set is so small. Your network is most likely over-fitting your training data set since it is so small, and Dropout is an easy way to help avoid this problem.
Let me know if these suggestions make a difference!
I agree that the dataset is too small, 50 instances of each person is probably not enough. You can use data augmentation with the keras ImageDataGenerator method to increase the number of images, and rewrite your numpy reshaping code as a pre-processing function for the generator. I also noticed that you haven't shuffled the data, so the network is likely predicting the first class for everything (which is maybe why the accuracy is near random chance).
If increasing the dataset size doesn't help, you'll probably have to play around with the learning rate for the Adam optimizer.
I want to classify pattern on image. My original image shape are 200 000*200 000 i reshape it to 96*96, pattern are still recognizable with human eyes. Pixel value are 0 or 1.
i'm using the following neural network.
train_X, test_X, train_Y, test_Y = train_test_split(cnn_mat, img_bin["Classification"], test_size = 0.2, random_state = 0)
class_weights = class_weight.compute_class_weight('balanced',
np.unique(train_Y),
train_Y)
train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)
train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, random_state=13)
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(96,96,1)))
model.add(MaxPool2D())
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(16, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
train = model.fit(train_X, train_label, batch_size=80,epochs=20,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights)
I have already run some experiment to find a "good" number of hidden layer and fully connected layer. it's probably not the most optimal architecture since my computer is slow, i just ran different model once and selected best one with matrix confusion, i didn't use cross validation,I didn't try more complex architecture since my number of data is small, i have read small architecture are the best, is it worth to try more complex architecture?
here the result with 5 and 12 epoch, bach size 80. This is the confusion matrix for my test set
As you can see it's look like i'm overfiting. When i only run 5 epoch, most of the class are assigned to class 0; With more epoch, class 0 is less important but classification is still bad
I added 0.8 dropout after each convolutional layer
e.g
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
With drop out, 95% of my image are classified in class 0.
I tryed image augmentation; i made rotation of all my training image, still used weighted activation function, result didnt improve. Should i try to augment only class with small number of image? Most of the thing i read says to augment all the dataset...
To resume my question are:
Should i try more complex model?
Is it usefull to do image augmentation only on unrepresented class? then should i still use weight class (i guess no)?
Should i have hope to find a "good" model with cnn when we see the size of my dataset?
I think according to the imbalanced data, it is better to create a custom data generator for your model so that each of it's generated data batch, contains at least one sample from each class. And also it is better to use Dropout layer after each dense layer instead of conv layer. For data augmentation it is better to at least use combination of rotate, horizontal flip and vertical flip. there are some other approaches for data augmentation like using GAN network or random pixel replacement.
For Gan you can check This SO post
For using Gan as data augmenter you can read This Article.
For combination of pixel level augmentation and GAN pixel level data augmentation
What I used - in a different setting - was to upsample my data with ADASYN. This algorithm calculates the amount of new data required to balance your classes, and then takes available data to sample novel examples.
There is an implementation for Python. Otherwise, you also have very little data. SVMs are good performing even with little data. You might want to try them or other image classification algorithms depending where the expected pattern is always at the same position, or varies. Then you could also try the Viola–Jones object detection framework.
I am trying to estimate the third band(Blue) in an RGB image using convolutional neural networks. my design using Keras is a sequentiol model with a convolution2D layer as input layer two hidden layers and output neuron. if i want loss(rmse) to be zero how should i change my model?
my model in python goes like this
in_image = skimage.io.imread('test.jpg')[0:50,0:50,:].astype(float)
data = in_image[:,:,0:2]
target = in_image[:,:,2:3]
model1 = keras.models.Sequential()
model1.add(keras.layers.Convolution2D(50,(3,3),strides = (1,1),padding = "same",input_shape=(None,None,2))) #Convolution Layer
model1.add(keras.layers.Dense(50,activation = 'relu')) # Hiden Layer1
model1.add(keras.layers.Dense(50,activation = 'sigmoid')) # Hidden Layer 2
model1.add(keras.layers.Dense(1)) # Output Layer
adadelta = keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
model1.compile(loss='mean_squared_error', optimizer=adadelta) # Compile the model
model1.fit(np.array([data]),np.array([target]),epochs = 5000)
estimated_band = model1.predict(np.array([data]))
Given your problem setup, it looks like you're trying to training a neural network on one image such that it is able to predict the blue channel of an image from other 2 images. Putting aside the use of such an experiment, there are a few important things when training neural networks properly, including.
learning rate
weight initialization
optimizer
model complexity.
Yann Lecun's Efficient backprop is a late 90s paper that talks about numbers 1, 2 and 3. Number 4 holds on the assumption that as the number of free parameters increase, at some point you'll be able to match each parameter to each output.
Note that achieving zero-loss provides no guarantees on generalization nor does it mean that your model will not generalize, as brilliantly described in a paper presented at ICLR.