I was trying to solve the Dogs vs. Cats Redux: Kernels Edition problem on Kaggle. It is a simple image classification problem. However, I am doing worse than a random predictor with a score of 17+. Does anyone know why this might be?
Neural Network Model
def convolutional_neural_network():
weights = {
# 3x3x3 conv => 1x1x8
'conv1': tf.Variable(tf.random_normal([3, 3, 3, 8])),
# 5x5x8 conv => 1x1x16
'conv2': tf.Variable(tf.random_normal([5, 5, 8, 16])),
# 3x3x16 conv => 1x1x32
'conv3': tf.Variable(tf.random_normal([3, 3, 16, 32])),
# 32 FC => output_features
'out': tf.Variable(tf.random_normal([(SIZE//16)*(SIZE//16)*32, output_features]))
}
biases = {
'conv1': tf.Variable(tf.random_normal([8])),
'conv2': tf.Variable(tf.random_normal([16])),
'conv3': tf.Variable(tf.random_normal([32])),
'out': tf.Variable(tf.random_normal([output_features]))
}
conv1 = tf.add(conv2d(input_placeholder, weights['conv1'], 1), biases['conv1'])
relu1 = relu(conv1)
pool1 = maxpool2d(relu1, 4)
conv2 = tf.add(conv2d(pool1, weights['conv2'], 1), biases['conv2'])
relu2 = relu(conv2)
pool2 = maxpool2d(relu2, 2)
conv3 = tf.add(conv2d(pool2, weights['conv3'], 1), biases['conv3'])
relu3 = relu(conv3)
pool3 = maxpool2d(relu3, 2)
pool3 = tf.reshape(pool3 , shape=[-1, (SIZE//16)*(SIZE//16)*32])
output = tf.add(tf.matmul(pool3, weights['out']), biases['out'])
return output
The the output has no activation function.
Prediction, Optimizer and Loss Function
output_prediction = convolutional_neural_network()
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(output_prediction, output_placeholder) )
trainer = tf.train.AdamOptimizer()
optimizer = trainer.minimize(loss)
test_prediction = tf.nn.softmax(output_prediction)
The images are converted into an numpy array of size 128x128x3 and fed into the neural network with a batch size of 64.
Full Code Here
Edit : Ran the same code for 200 epochs. No improvement. I did Slightly worse.
This is more of a comment but not enough privilege points for that:
Did you normalize your data (i.e. divide the pixels values by 255)? I can't you see doing that in the script.
When you get terrible results like 17 logloss that means your model is always predicting one class with 100% confidence. Usually in this case it's not the architecture or learning rate or number of epochs but rather some silly mistake like forgetting to normalize or mixing up your labels. For this particular problem and given your architecture you should see an accuracy of about 80% and 0.4 logloss within 40 number of epochs. No need for thousands of epochs :)
Improving accuracy is an art than one task solution, you can try some of these methods:
try different gradient optimization, SGD, momentum, nestrov, adap, ...
Try adaptive learning rate
improve regularization methods L1,L2,dropout, drop connect, ...
Augment your training data (have more data).
change your network hyper parameters
finally if nothing helped change the network structure.
Related
I'm training a GAN network by Keras in Tensorflow 2.0. I can get some reasonable yet not well-looked results, until I try to add BatchNormalization layers.
I know the training of GAN is very sensitive with many reasons to cause divergence, but I want to know what is going wrong under this case: both the discriminator / generator loss drop to zero.
My network is like the common examples of DCGAN:
===== Generator =====
Input(128)
Dense(16384) => ReLU
Reshape(4 x 4 x 1024)
Conv2DTranspose(8 x 8 x 512, kernel=4, stride=2) => ReLU
Conv2DTranspose(16 x 16 x 256, kernel=4, stride=2) => ReLU
Conv2DTranspose(32 x 32 x 128, kernel=4, stride=2) => ReLU
Conv2DTranspose(64 x 64 x 3, kernel=4, stride=2, activation=sigmoid)
===== Discriminator =====
Conv2D(32 x 32 x 64, kernel=3, stride=2) => LeakyReLU(alpha=0.2)
Conv2D16 x 16 x 128, kernel=3, stride=2) => LeakyReLU(alpha=0.2)
Conv2D(8 x 8 x 256, kernel=3, stride=2) => LeakyReLU(alpha=0.2)
Conv2D(4 x 4 x 512, kernel=3, stride=2) => LeakyReLU(alpha=0.2)
Flatten(8192)
Dense(1, activation=sigmoid)
I also follow the suggested training settings of DCGAN:
Kernel init = RandomNormal, stddev=0.02
Optimizer = Adam, beta1 = 0.5
Learning rate = 0.0002
My dataset contains 2048 Images with a specified class.
On my first try, I train the network by the following order:
1. Draw 128 real samples with small spatial augmentation.
2. Generate 128 fake samples by current generator.
3. Stack the samples and train the discriminator by these 256 samples as a batch
4. Generate 256 random latent data vectors
5. Train the generator by these 256 vectors as a batch
The loss values are averaged and reported after every epoch.
I get fair results from these settings. The discriminator loss is in 0.60-0.70 and the generator loss is in 0.70-1.00, but the improvement of quality seems to be slow. So I add batch normalization layer to all (transposed) convolutions except the one at generator output, as commonly suggested.
After adding batch normalization, the training loss becomes much unstable, but do not directly diverge. The discriminator loss drops to 0.20-0.40 and the generator loss varies in 1.00-3.00.
I have tried momentum = 0.8 or 0.9 and they give similar behavior.
Then I try to NOT stack the real/fake samples in a single batch, but rather train the discriminator by 128 real samples, then 128 fake samples, and still use the batch normalization layers.
Under this setting, the discriminator and generator loss both drop rapidly after the first epoch and toward 0. The generated images look like strong color noises at every pixel, and the predicted probabilities (after sigmoid) of these noisy images are all close to 1.0.
If I remove all batch normalization layers but just train the real/fake samples separately, this problem does not happen.
If the generator can fool the discriminator by noisy images and get high probability, why the discriminator loss can still be very close to zero after its training? Does batch normalization layer have some bad effect under this scenario?
This problem is also called Saturation Problem, it means that the discriminator trains beyond the limit. In simple words, it happened because of the vanishing gradient problem of GANs. If you want to understand the mathematics behind this I encourage you to read this article carefully.
I'm learning the simplest neural networks using Dense layers using Keras. I'm trying to implement face recognition on a relatively small dataset (In total ~250 images with 50 images per class).
I've downloaded the images from google images and resized them to 100 * 100 png files. Then I've read those files into a numpy array and also created a one hot label array for training my model.
Here is my code for processing the training data:
X, Y = [], []
feature_map = {
'Alia Bhatt': 0,
'Dipika Padukon': 1,
'Shahrukh khan': 2,
'amitabh bachchan': 3,
'ayushmann khurrana': 4
}
for each_dir in os.listdir('.'):
if os.path.isdir(each_dir):
for each_file in os.listdir(each_dir):
X.append(cv2.imread(os.path.join(each_dir, each_file), -1).reshape(1, -1))
Y.append(feature_map[os.path.basename(each_file).split('-')[0]])
X = np.squeeze(X)
X = X / 255.0 # normalize the training data
Y = np.array(Y)
Y = np.eye(5)[Y]
print (X.shape)
print (Y.shape)
This is printing (244, 40000) and (244, 5). Here is my model:
model = Sequential()
model.add(Dense(8000, input_dim = 40000, activation = 'relu'))
model.add(Dense(1200, activation = 'relu'))
model.add(Dense(700, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(5, activation = 'softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=25, batch_size=15)
When I train the model, It stuck at the accuracy 0.2172, which is almost the same as random predictions (0.20).
I've also tried to train mode with grayscale images but still not getting expected accuracy. Also tried with different network architectures by changing the number of hidden layers and neurons in hidden layers.
What am I missing here? Is my dataset too small? or am I missing any other technical detail?
For more details of code, here is my notebook: https://colab.research.google.com/drive/1hSVirKYO5NFH3VWtXfr1h6y0sxHjI5Ey
Two suggestions I can make:
Your data set is probably too small. If you are splitting training and validation at 80/20, that means you are only training on 200 images, which is probably too small. Try increasing your data set to see if results improve.
I would recommend adding Dropout to each layer of your network as your training set is so small. Your network is most likely over-fitting your training data set since it is so small, and Dropout is an easy way to help avoid this problem.
Let me know if these suggestions make a difference!
I agree that the dataset is too small, 50 instances of each person is probably not enough. You can use data augmentation with the keras ImageDataGenerator method to increase the number of images, and rewrite your numpy reshaping code as a pre-processing function for the generator. I also noticed that you haven't shuffled the data, so the network is likely predicting the first class for everything (which is maybe why the accuracy is near random chance).
If increasing the dataset size doesn't help, you'll probably have to play around with the learning rate for the Adam optimizer.
I am working on a stock market prediction project using sentiment analysis. I am trying to create a CNN model where I am passing 4000 days of stock data with a batch size of 100. At the end of the dense layer, I want to add regression layer to get the price of the stock.
def Model(train_data):
input_layer = tf.reshape(tf.cast(train_data, tf.float32), [-1, 1, 100, 2])
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[1, 5],padding="same",
activation=tf.nn.relu,strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[1, 2], strides=[1,2])
conv2 = tf.layers.conv2d(inputs=pool1,filters=8,kernel_size=[1, 5],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[1, 5], strides=[1,5])
conv3 = tf.layers.conv2d(inputs=pool2,filters=2,kernel_size=[1, 2],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[1, 2], strides=[1, 2])
pool3_flat = tf.reshape(pool3, [40, 1 * 5 * 2])
dense = tf.layers.dense(inputs=pool3_flat, units=5, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.2, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=1)
I am referring https://www.tensorflow.org/tutorials/estimators/cnn for the model, but they are doing classification. Can anybody suggest an approach for regression? The train_data for the model has a shape of [2,4000] where one row is for normalized stock prices and another is for sentiment factor.
The only thing you would have to do would be to add a fully connected layer at the very end, and select a linear activation. Intuitively, this will take the outputs of your Conv layers, and apply y = mx + b to them. Your fully connected output layer would have 40 nodes (one for each output). In fact, you already have one dense layer in that code. If your output is of size 40, just make it 40 instead of 5.
Just a side note, traditionally, CNNs are used for image classification, and only recently did it start migrating to other applications (such as spam detection). I would advise trying a simple feed forward neural network first, and if that does not work, perhaps try a RNN before this.
I am training a CNN to classify a 28x28 rgb image into 200 categories.
The classifier reaches ~95% accuracy on the train set.
The test images are obtained by taking a screenshot, cropping and resizing the roi to 28x28.
This image processing causes a slight difference in the train and test images (example attached).
Even though the difference is almost imperceptible to the human eye it causes a huge drop in accuracy for my classifier.
My classifier reaches up to 95% accuracy on the train set but only ~10% on the test set.
I started applying random perturbations to the training images (blur, pixelation, noise, translation, scaling) and started blurring the test images but test accuracy only barely improved.
How can I make my classifier robust so that it generalizes over slight pixel differences?
Here is my network
network = input_data(shape=[None, img_size[0], img_size[1], 3], name='input')
conv1 = relu(batch_normalization(
conv_2d(network, 16, 3, bias=False, activation=None, regularizer="L2"), trainable=is_training))
conv2 = relu(batch_normalization(
conv_2d(conv1, 32, 3, bias=False, activation=None, regularizer="L2"), trainable=is_training))
conv3 = relu(batch_normalization(
conv_2d(conv2, 64, 3, bias=False, activation=None, regularizer="L2"), trainable=is_training))
net = fully_connected(conv3, 128, activation='relu', regularizer="L2")
net = fully_connected(net, num_elements, activation='softmax')
return regression(net, optimizer='adam', learning_rate=learning_rate,
loss='categorical_crossentropy', name='target')
Train image:
Test image
200 categories is a lot. Are you sure something is not dominating the other classes, e.g. the model is not guessing 'background' all the time and being right 95% of the time just because 95% of the images are 'background'?
Pooling (p. 335 onwards), by for example max pooling, is one way to introduce invariance to small transformations. You should try it out.
Other ways to limit overfit are by tuning that L2 regularization you are already using, to add dropout to the fully connected layer and to not have a too big minibatch size. You could add also small rotations to the list of augmentations you are doing, if you find it appropriate. Maybe random reflections too, if you expect that to happen in the real world? I don't think it's about the augmentation though.
And finally my personal favourite : human error. Usually when I see something this odd it was just my own fault. You should go through the code and intermediate variables again, more than once.
I'm doing a binary classification with CNN using Matconvnet on Matconvnet. And now, I'm trying to realize it through Keras on Python. The network is not complex at all and I achieved 96% accuracy on Matconvnet. However with Keras, even I tried my best to ensure every setting is exactly the same with before, I can't get the same result. Or event worse, the model doesn't work at all.
Here are some details about the setting. Any ideas or help is appreciated!
Input
The images are 20*20 size. Training size is 400 and testing size 100, validation size 132.
Matconvnet: images stored in 20*20*sample_size method
Keras: images stored in sample_size*20*20*1 method
CNN Structure
(3*3)*3 conv- (2*2) maxpooling- fully connected- softmax- logloss
Matconvnet: Use convolutionized layer instead of fully connected one. Here is the code:
function net = initializeCNNA()
f=1/100 ;
net.layers = {} ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{f*randn(3,3,1,3, 'single'), zeros(1, 3, 'single')}}, ...
'stride', 1, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'pool', ...
'method', 'max', ...
'pool', [2 2], ...
'stride', 2, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{f*randn(9,9,3,2, 'single'), zeros(1,2,'single')}}, ...
'stride', 1, ...
'pad', 0) ;
net.layers{end+1} = struct('type', 'softmaxloss') ;
net = vl_simplenn_tidy(net) ;
Keras:
model = Sequential()
model.add(Conv2D(3, (3,3),kernel_initializer=\
keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))
model.add(Flatten())
model.add(Dense(2,activation='softmax',\
kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None)))
Loss Function
Matconvnet: softmaxloss
Keras: binary_crossentropy
Optimizer
Matconvnet: SGD
trainOpts.batchSize = 50;
trainOpts.numEpochs = 20 ;
trainOpts.learningRate = 0.001 ;
trainOpts.weightDecay = 0.0005 ;
trainOpts.momentum = 0.9 ;
Keras: SGD
sgd = optimizers.SGD(lr=0.001, momentum=0.9, decay=0.0005)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
Initialization: filters:N(0,0.1), bias: 0
normalization: no batch normalization except normalization while input to have 0 mean and 1 std for images.
Above are the aspects I reviewed to make sure I did the correct replication. Yet I don't understand why it doesn't work on Keras. Here are some guesses:
Matconvnet uses a convolutionized layer instead of fully connected layer and may imply some fancy way to update the parameters.
They use a different algorithm to apply SGD whose parameters have different meaning.
I also did other tries:
Change optimizer in Keras into Adadelta(). No improvement.
Change network structure and make it deeper. It works!
But still want to know why Matconvnet can achieve that good result with a much simpler one.
"Matconvnet uses a convolutionized layer instead of fully connected layer and may imply some fancy way to update the parameters."
No. Technically, there should be no difference between convolution and fully connected layers. I'm pretty sure there's no fancy way to update the parameters.
More comments coming..
some of the discussion in this post may help:
Can't replicate a matconvnet CNN architecture in Keras