Validation and training accuracy high in the first epoch [Keras]

Validation and training accuracy high in the first epoch [Keras] - python

I am training an image classifier with 2 classes and 53k images, and validating it with 1.3k images using keras. Here is the structure of the neural network :
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy', metrics=['accuracy'])
Training accuracy increases from ~50% to ~85% in the first epoch, with 85% validation accuracy. Subsequent epochs increase the training accuracy consistently, however, validation accuracy stays in the 80-90% region.
I'm curious, is it possible to get high validation and training accuracy in the first epoch? If my understanding is correct, it starts small and increases steadily with each passing epoch.
Thanks
EDIT : The image size is 150x150 after rescaling and the mini-batch size is 16.

Yes, it is entirely possible to get high accuracy on first epoch and then only modest improvements.
If there is enough redundancy in the data and you make enough updates (wrt. the complexity of your model, which seems fairly easy to optimize) in the first epoch (i.e. you use small minibatches), it's entirely possible that you learn most of the important stuff during the first epoch. When you show the data again, the model will start overfitting to pecularities introduced by the specific images in your train set (thus you get increasing training accuracy), but since you do not provide any novel samples, it will not learn anything new about the underlying properties of your classes.
You can think of your training data as an infinite stream (which actually SGD would like to enjoy all the convergence theorems). Do you think that you need more than 50k samples to learn what is important? You can actually test the data-hunger of your model by providing less data or reporting performance after a some sub-epoch number of updates.

You cannot expect to get an accuracy over 90-95% with image classification using feed forward neural networks.
You need to use another architecture called Convolution Neural network, state of the art in image recognition.
Also it is very easy to build that using keras, but it is computationally more intensive than this.
If you want to stick with feed forward layers the best thing you can do is early stopping, but even that wouldn't give you accuracy over 90%.

Yeah, epoch are supposed to fit the data on the model.
Try using 2 neurons at the end and one hot encode on ur Class label!
Like I have seen one case where I got better results doing that, instead of binary output.

Related

Different overfitting for three models with the same structure

I want to design three models that they have the same structure but at the end one of them should have some serious overfitting and another model has less overfitting and the last model has no overfitting.
The idea is that i want to see how much information exist in last layer of each model for some test data. let's say I m using mnist dataset as training and testing set and the structure of all models should be like this.
# Network architecture
network = Sequential()
# input layer
network.add(Dense(512, activation='relu', input_shape=(28*28,) ))
# Hidden layers
network.add(Dense(64, activation='relu', name='features'))
# Output layer
network.add((Dense(10,activation='softmax')))
network.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2)
So now the question is how to change this train model that fulfills my needs for three models with different overfitting.
I m new in machine learning topics and i hope i have explain my question as good as possible.
Thanks in advance

Overfit:
The MNIST dataset is rather simple, therefore it should be easy to overfit with the model you are suggesting. Increase the number of epochs: eventually, your model will memorize the training data very well. If you struggle to overfit the data, you might need a more complex network - but I doubt that this will be the case.
Just right:
Probably the easiest wat to obtain the model which is just right (no overfit or underfit) use a callback. Specifically, we can use early stopping. The callback will stop training if the validation loss stops improving. For your code, all you have to do is modify the training as follows:
First define a callback
callback_es = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss')
Add the callback to your training
history = network.fit(train_img, train_label, epochs=50, batch_size=256, validation_split=0.2, callback = [callback_es])
Underfit
Similar idea as with overfitting. In this case, you want to stop your training early on. Train your model for a limited number of epochs only. If you find that your model overfits to quickly, try to lower the learning rate.

Keras neural network predicts the same number for all inputs

I am trying to create a keras neural network to predict distance on roads between two points in city. I am using Google Maps to get travel distance and then train neural network to do that.
import pandas as pd
arr=[]
for i in range(0,100):
arr.append(generateTwoPoints(55.901819,37.344735,55.589537,37.832254))
df=pd.DataFrame(arr,columns=['p1Lat','p1Lon','p2Lat','p2Lon', 'distnaceInMeters', 'timeInSeconds'])
print(df)
Neural network architecture:
from keras.optimizers import SGD
sgd = SGD(lr=0.00000001)
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(100, input_dim=4 , activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mse'])
Then i divide sets to test/train
Xtrain=train[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytrain=train[['distnaceInMeters']]/100000
Xtest=test[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytest=test[['distnaceInMeters']]/100000
Then i fit data into the model, but loss stays the same:
history = model.fit(Xtrain, Ytrain,
batch_size=1,
epochs=1000,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(Xtest, Ytest))
I later print the data:
prediction = model.predict(Xtest)
print(prediction)
print (Ytest)
But result is the same for all the inputs:
[[0.26150784]
[0.26171574]
[0.2617755 ]
[0.2615582 ]
[0.26173398]
[0.26166356]
[0.26185763]
[0.26188275]
[0.2614446 ]
[0.2616575 ]
[0.26175532]
[0.2615183 ]
[0.2618127 ]]
distnaceInMeters
2 0.13595
6 0.27998
7 0.48849
16 0.36553
21 0.37910
22 0.40176
33 0.09173
39 0.24542
53 0.04216
55 0.38212
62 0.39972
64 0.29153
87 0.08788
I can not find the problem. What is it? I am new to machine learning.

You are doing a very elementary mistake: since you are in a regression setting, you should not use a sigmoid activation for your final layer (this is used for binary classification cases); change your last layer to
model.add(Dense(1,activation='linear'))
or even
model.add(Dense(1))
since, according to the docs, if you do not specify the activation argument it defaults to linear.
Various other advice offered already in the other answer and the comments may be useful (lower LR, more layers, other optimizers e.g. Adam), and you certainly need to increase your batch size; but nothing will work with the sigmoid activation function you currently use for your last layer.
Irrelevant to the issue, but in regression settings you don't need to repeat your loss function as a metric; this
model.compile(loss='mse', optimizer='sgd')
will suffice.

It would be very useful if you could post the the progression of the loss and MSE (of both the training and validation/test set) as it goes throughout the training. Even better, it would be best if you can visualize it as per https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/ and post the vizualization here.
In the meantime, based on the facts:
1) You say the loss isn't decreasing (I'm assuming on the training set, during training, based on your compile args).
2) You say that the prediction "accuracy" on your test set is bad.
3) My experience/intuition (not an empirical assessment) tells me that your two layer dense model is a little too small to be able to capture the complexity inherent in your data. AKA your model is suffering from too high a Bias https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229
The fastest and easiest thing you can try, is to try to add both more layers and more nodes to each layer.
However, I should note that there is a lot of causal information that can affect the driving distance and driving time beyond just the the distance between two coordinates, which might be the feature that your Neural network will most readily extract. For example, whether you drive on a highway or sides treets, traffic lights, whehter the roads twist and turn or go straight... to infer all of that just from that the data you will need enormous amounts of data(examples) in my opinion. If you could add input columns with e.g. disatance to nearest higway from both points, you might be able to train with less data
I would also reccomend that you souble check that you are feeding as input what you think you are feeding (and its shape), and also, you should use some standardization from function sklearn which might help the model learn faster and converge faster to a higher "accuracy".
If and when you post either more code or the training history I can help you more (and also how many training samples).
EDIT 1: Try changing batch size to a larger number preferably batch_size=32 if it fits in your memory. you can use a small batch size (such as 1) when working with an "info rich" input like an image, but when using a very "info poor" datum like 4 floats (2 coordinates), the gradient will point each batch (with batch_size=1) to a practically random (pseudo...) direction and not neccessarily get any closer to a local minimum. Only when taking the gradient on the collective loss of a larger batch (like 32, and perhaps more) will you get a gradient that points at least approximately in the direction of the local minimum and converge to a better result. Also, I suggest that you don't mess with the learning rate manually and perhaps change to an optimizer like "adam" or "RMSProp".
Edit 2: #Desertnaut made an excellent point that I totally missed, a correction without which, your code will not work properly. He deserves the credit so I will not include it here. Please refer to his answer. Also, don't forget to raise your batch size, and not "manually mess" with your learning rate, "adam" for example, will do it for you.

Getting Validation Accuracy of 99% with MNIST with less than 10000 parameters CNN

Given MNIST dataset in keras,the challenge is to develop a CNN neural net model with less than 10k parameters with 99% validation accuracy.
I tried making the model for the same but am getting accuracy as 98.71.
def create_model():
lay1=Conv2D(2,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay1=Conv2D(2,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay1)
lay1=Conv2D(2,kernel_size=(1,1),activation='relu',padding='same')(lay1)
lay1=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay1)
lay2=Conv2D(4,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay2=Conv2D(4,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay2)
lay2=Conv2D(4,kernel_size=(1,1),activation='relu',padding='same')(lay2)
lay2=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay2)
lay3=Conv2D(6,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay3=Conv2D(6,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay3)
lay3=Conv2D(6,kernel_size=(1,1),activation='relu',padding='same')(lay3)
lay3=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay3)
fc=concatenate([lay1,lay2,lay3])
fc=Flatten()(fc)
fc=Dense(10,activation='relu')(fc)
outputs=Dense(10,activation='softmax')(fc)
model=Model(input=inputs,output=outputs)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
return model
The total parameters coming are 8,862 and the batch size used for the above is 32 and the number of epochs are 10.
Can you please suggest ways to improve the model with the constraints on the number of parameters so that the validation accuracy is 99% or above?

You should be able to add another Conv/Pool block when reducing the kernel_size in the previous ones to 3 or 5 instead of 7. If research told us one thing recently is that deeper is (mostly) better.
Otherwise play with the hyperparameters (learning-rate, batch-size, ...) or augment the input data randomly. Batch- or Layer-Normalization have proven themselves very util lately (and they only introduce a handful of parameters).

Here is a blog post which aims to train a CIFAR10 ResNet Classifier to reach 94% in few seconds. Although the goal is a liitle bit different, some tricks might be useful for you, such as CELU activation or label smoothing.

Dealing with inserting noise to train data in Keras (Deep learning)

I am using Keras for Deep learning.
I want to put noise into train data at each epoch during training.
So, at every epoch, the train data should be different from before epoch, because of random noise insertion.
This is my code:
model = Sequential()
model.add(GaussianNoise(SNR_std))
model.add(Dense(neuron,input_dim=1920,
kernel_initializer=initializers.he_normal(seed=seed_num),
use_bias=False)
model.add(BatchNormalization())
model.add(Activation('relu'))
Did I do it in a right way for my intention?

It seems correct to me.
One thing to note is that if you change the images using noise like this you should visualize those images at least once before starting your training, so that you actually know what you are learning. So getting a handle to the output of that layer is key. Answers on how to do this can be found all over the internet: (https://datascience.stackexchange.com/questions/20469/keras-visualizing-the-output-of-an-intermediate-layer)

Accuracy Decreasing with higher epochs

I am a newbie to Keras and machine learning in general. I'm trying to build a binary classification model using the Sequential model. After some experimenting, I saw that on multiple runs(not always) I was getting a an accuracy of even 97% on my validation data in the second or third epoch itself but this dramatically decreased to as much as 12%. What is the reason behind this ? How do I fine tune my model ?
Here's my code -
model = Sequential()
model.add(Flatten(input_shape=(6,size)))
model.add(Dense(6,activation='relu'))
model.add(Dropout(0.35))
model.add(Dense(3,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['binary_accuracy'])
model.fit(x, y,epochs=60,batch_size=40,validation_split=0.2)

According to me, you can take into consideration the following factors.
Reduce your learning rate to a very small number like 0.001 or even 0.0001.
Provide more data.
Set Dropout rates to a number like 0.2. Keep them uniform across the network.
Try decreasing the batch size.
Using appropriate optimizer: You may need to experiment a bit on this. Use different optimizers on the same network, and select an optimizer which gives you the least loss.
If any of the above factors work for you, please let me know about it in the comments section.

When I have this issue, I solved by changing the optimizer from RMSprop to adam. I also reduced the learning rate and added dropout after each fully connected layers. If your FC layers have small number of neurons then adding dropout does not make a major difference.

Reduce the learning rate (like 0.001 or 0.0001) and add batch normalization after every convolution layer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.