Accuracy Decreasing with higher epochs

Accuracy Decreasing with higher epochs - python

I am a newbie to Keras and machine learning in general. I'm trying to build a binary classification model using the Sequential model. After some experimenting, I saw that on multiple runs(not always) I was getting a an accuracy of even 97% on my validation data in the second or third epoch itself but this dramatically decreased to as much as 12%. What is the reason behind this ? How do I fine tune my model ?
Here's my code -
model = Sequential()
model.add(Flatten(input_shape=(6,size)))
model.add(Dense(6,activation='relu'))
model.add(Dropout(0.35))
model.add(Dense(3,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['binary_accuracy'])
model.fit(x, y,epochs=60,batch_size=40,validation_split=0.2)

According to me, you can take into consideration the following factors.
Reduce your learning rate to a very small number like 0.001 or even 0.0001.
Provide more data.
Set Dropout rates to a number like 0.2. Keep them uniform across the network.
Try decreasing the batch size.
Using appropriate optimizer: You may need to experiment a bit on this. Use different optimizers on the same network, and select an optimizer which gives you the least loss.
If any of the above factors work for you, please let me know about it in the comments section.

When I have this issue, I solved by changing the optimizer from RMSprop to adam. I also reduced the learning rate and added dropout after each fully connected layers. If your FC layers have small number of neurons then adding dropout does not make a major difference.

Reduce the learning rate (like 0.001 or 0.0001) and add batch normalization after every convolution layer.

Related

How can I increase the VGG16 Model's Accurancy? (Underfitting or Overfitting Issues)

I have a problem about increasing the accuracy of the VGG16 model.
Even if I defined some Dense layers, I couldn't handle with it. Can you help me how to get the best result if you don't mind? I tried to use Dropout but I couldn't increase its accuracy. Can you look through it if you don't want to open this file?
I think it can be overfitting or Underfitting in terms of model's behaviour.
Here is my model shown below.
base_model=VGG16(
include_top=False,
weights="imagenet",
input_shape=(IMAGE_SIZE,IMAGE_SIZE,3))
#freeze the base model
base_model.trainable = False
model=Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(512,activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(256,activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(num_classes,activation='softmax'))
model.summary()
Here is my project link : Project

There are a number of different things you can do, and that depends on your problem scope. What you are showing is the basic transfer learning model with a couple of dense layers.
Regularisation is one thing that you have done already by using Dropout, but you have turned it off. Other regularisation tools are L2, L1 regularisation to keep things simple. The other thing you can do is to use a lower learning rate, reduce the batch size, use batch normalisation or change the optimisation function, or all of the above at the same time.
Creating a Neural Network Model is the easy part. The more important and hard to master skill is optimising it to perform good on general data by tweaking each parameter untill you produce better results.
Try looking at these three guides (or other ones that you can find about hyperparameter optimisation) to understand more:
Optimising Neural Networks Where To Start?
A Comprehensive Guide on Neural Networks Performance Optimization
An Overview of Regularization Techniques in Deep Learning

Dropout is a regularization technique that introduces noise in the network weights in order to avoid overspecialization of the network to the training inputs (overfitting). The way this kind of layer introduces noise is by mapping some trained weights to 0. The probability according to each weight will remain untouched is given by the dropout value (the input value to the Dropout layer).
In your case, your dropout value set to 0.2 will mean that ~80% of the weights will be changed into 0 for each layer, which largely reduces the training effectiveness of any neural model. Typically you want to keep this value high, but not equal to 1, otherwise no regularization (annihilation of weights) will be performed.
Try reintroducing Dropout with higher values (like 0.95, 0.9, 0.8, 0.7) and compare them without the regularization (dropout=1 or just comment the dropout layer line) and with excess of regularization (dropout=0.2).
This fix on dropout may come in handy for boosting your model performances.

Training Loss Improving but Validation Converges Early

I am creating a CNN using TensorFlow and when training, I find that the training dataset is still improving (i.e. loss still decreasing), while the test/validation dataset has converged and is no longer improving. (Learning Curve Plot attached below)
Does anyone know why this might be the case and how could I possibly fix it, to have the validation loss reduce along with the training? Would be greatly appreciated!
Plot of my models learning curve:

The plot of losses is very typical. Your model appears to be performing very well with very low MSE losss. At this point you have essentially reached the limits of your models performance. One thing which may help is to use an adjustable learning rate. The Keras callback ReduceLROnPlateau can be setup to monitor the validation loss. If the validation loss fails to decrease for a 'patience' number of epochs the learning rate will be reduced by a factor "factor" where factor is a number less than 1. Documentation is here.
You may also want to use the Keras EarlyStopping callback. This callback can be set to monitor validation loss and halt training if it fails to decrease for "patience" number of epochs. If you set restore_best_weights=True it will leave your model with the weights used in the epoch with the lowest validation loss. This will prevent your model from returning an over fit model. My recommended code is shown below
rlronp=f.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1)
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=3,restore_best_weights=True)
callbacks=[rlronp, estop]
In model.fit include callbacks=callbacks. I suspect neither of the above will provide much improvement. You will probably have to try some changes to your model as well. Adding a Dropout layer may help to some degree to reduce over-fitting as would including regularization. Documentation for that is here.. Of course the standard approach of getting a larger data set may also help but is not always easy to achieve. If you are working with images you could try image augmentation using say the Keras ImageDataGenerator or Tensorflow Image Augmentation layers. Documentation for that is here.. One thing I found which helps for the case of images is to crop your images to just the Region of Interest (ROI). For example if you were doing face recognition cropping the images to just be of the face will help significantly.

this means you're hitting your architecture's limit, training loss will keep decreasing (this is known as overfitting), which will eventually INCREASE validation loss, make changes to the parameters or consider altering your layers (adding, removing, etc.), maybe even look into ways you could alter the dataset.
when this happened to me a while ago I added a LSTM layer to my CNN architecture and also incorporated K-means validation, this is not a walkthrough, you need to figure this out for your specific problem, good luck.

Keras neural network predicts the same number for all inputs

I am trying to create a keras neural network to predict distance on roads between two points in city. I am using Google Maps to get travel distance and then train neural network to do that.
import pandas as pd
arr=[]
for i in range(0,100):
arr.append(generateTwoPoints(55.901819,37.344735,55.589537,37.832254))
df=pd.DataFrame(arr,columns=['p1Lat','p1Lon','p2Lat','p2Lon', 'distnaceInMeters', 'timeInSeconds'])
print(df)
Neural network architecture:
from keras.optimizers import SGD
sgd = SGD(lr=0.00000001)
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(100, input_dim=4 , activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='sgd', metrics=['mse'])
Then i divide sets to test/train
Xtrain=train[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytrain=train[['distnaceInMeters']]/100000
Xtest=test[['p1Lat','p1Lon','p2Lat','p2Lon']]/100
Ytest=test[['distnaceInMeters']]/100000
Then i fit data into the model, but loss stays the same:
history = model.fit(Xtrain, Ytrain,
batch_size=1,
epochs=1000,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(Xtest, Ytest))
I later print the data:
prediction = model.predict(Xtest)
print(prediction)
print (Ytest)
But result is the same for all the inputs:
[[0.26150784]
[0.26171574]
[0.2617755 ]
[0.2615582 ]
[0.26173398]
[0.26166356]
[0.26185763]
[0.26188275]
[0.2614446 ]
[0.2616575 ]
[0.26175532]
[0.2615183 ]
[0.2618127 ]]
distnaceInMeters
2 0.13595
6 0.27998
7 0.48849
16 0.36553
21 0.37910
22 0.40176
33 0.09173
39 0.24542
53 0.04216
55 0.38212
62 0.39972
64 0.29153
87 0.08788
I can not find the problem. What is it? I am new to machine learning.

You are doing a very elementary mistake: since you are in a regression setting, you should not use a sigmoid activation for your final layer (this is used for binary classification cases); change your last layer to
model.add(Dense(1,activation='linear'))
or even
model.add(Dense(1))
since, according to the docs, if you do not specify the activation argument it defaults to linear.
Various other advice offered already in the other answer and the comments may be useful (lower LR, more layers, other optimizers e.g. Adam), and you certainly need to increase your batch size; but nothing will work with the sigmoid activation function you currently use for your last layer.
Irrelevant to the issue, but in regression settings you don't need to repeat your loss function as a metric; this
model.compile(loss='mse', optimizer='sgd')
will suffice.

It would be very useful if you could post the the progression of the loss and MSE (of both the training and validation/test set) as it goes throughout the training. Even better, it would be best if you can visualize it as per https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/ and post the vizualization here.
In the meantime, based on the facts:
1) You say the loss isn't decreasing (I'm assuming on the training set, during training, based on your compile args).
2) You say that the prediction "accuracy" on your test set is bad.
3) My experience/intuition (not an empirical assessment) tells me that your two layer dense model is a little too small to be able to capture the complexity inherent in your data. AKA your model is suffering from too high a Bias https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229
The fastest and easiest thing you can try, is to try to add both more layers and more nodes to each layer.
However, I should note that there is a lot of causal information that can affect the driving distance and driving time beyond just the the distance between two coordinates, which might be the feature that your Neural network will most readily extract. For example, whether you drive on a highway or sides treets, traffic lights, whehter the roads twist and turn or go straight... to infer all of that just from that the data you will need enormous amounts of data(examples) in my opinion. If you could add input columns with e.g. disatance to nearest higway from both points, you might be able to train with less data
I would also reccomend that you souble check that you are feeding as input what you think you are feeding (and its shape), and also, you should use some standardization from function sklearn which might help the model learn faster and converge faster to a higher "accuracy".
If and when you post either more code or the training history I can help you more (and also how many training samples).
EDIT 1: Try changing batch size to a larger number preferably batch_size=32 if it fits in your memory. you can use a small batch size (such as 1) when working with an "info rich" input like an image, but when using a very "info poor" datum like 4 floats (2 coordinates), the gradient will point each batch (with batch_size=1) to a practically random (pseudo...) direction and not neccessarily get any closer to a local minimum. Only when taking the gradient on the collective loss of a larger batch (like 32, and perhaps more) will you get a gradient that points at least approximately in the direction of the local minimum and converge to a better result. Also, I suggest that you don't mess with the learning rate manually and perhaps change to an optimizer like "adam" or "RMSProp".
Edit 2: #Desertnaut made an excellent point that I totally missed, a correction without which, your code will not work properly. He deserves the credit so I will not include it here. Please refer to his answer. Also, don't forget to raise your batch size, and not "manually mess" with your learning rate, "adam" for example, will do it for you.

Getting Validation Accuracy of 99% with MNIST with less than 10000 parameters CNN

Given MNIST dataset in keras,the challenge is to develop a CNN neural net model with less than 10k parameters with 99% validation accuracy.
I tried making the model for the same but am getting accuracy as 98.71.
def create_model():
lay1=Conv2D(2,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay1=Conv2D(2,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay1)
lay1=Conv2D(2,kernel_size=(1,1),activation='relu',padding='same')(lay1)
lay1=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay1)
lay2=Conv2D(4,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay2=Conv2D(4,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay2)
lay2=Conv2D(4,kernel_size=(1,1),activation='relu',padding='same')(lay2)
lay2=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay2)
lay3=Conv2D(6,kernel_size=(1,1),activation='relu',padding='same')(inputs)
lay3=Conv2D(6,kernel_size=(7,7),strides=(2,2),activation='relu',padding='same')(lay3)
lay3=Conv2D(6,kernel_size=(1,1),activation='relu',padding='same')(lay3)
lay3=MaxPooling2D(pool_size=(7,7),strides=(2,2),padding='same')(lay3)
fc=concatenate([lay1,lay2,lay3])
fc=Flatten()(fc)
fc=Dense(10,activation='relu')(fc)
outputs=Dense(10,activation='softmax')(fc)
model=Model(input=inputs,output=outputs)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
return model
The total parameters coming are 8,862 and the batch size used for the above is 32 and the number of epochs are 10.
Can you please suggest ways to improve the model with the constraints on the number of parameters so that the validation accuracy is 99% or above?

You should be able to add another Conv/Pool block when reducing the kernel_size in the previous ones to 3 or 5 instead of 7. If research told us one thing recently is that deeper is (mostly) better.
Otherwise play with the hyperparameters (learning-rate, batch-size, ...) or augment the input data randomly. Batch- or Layer-Normalization have proven themselves very util lately (and they only introduce a handful of parameters).

Here is a blog post which aims to train a CIFAR10 ResNet Classifier to reach 94% in few seconds. Although the goal is a liitle bit different, some tricks might be useful for you, such as CELU activation or label smoothing.

How can I reduce loss with my keras TensorFlow model?

So I have a dataset of about 140,000 samples with 5 inputs, the velocity of the car, the acceleration of the car, the velocity of the lead car gathered with radar, the distance of the lead car, and the acceleration of the lead car. The output is from 0 to 1, 0 being maximum brake, and 1 being max acceleration.
I'm a beginner to neural nets, so I'm having trouble optimizing my model to get the best accuracy/loss for this data. I've been experimenting with changing the optimizer, activation function, number of hidden layers, number of nodes in the layers, etc but nothing seems to be lowering the loss over time.
Here's my current model:
opt = keras.optimizers.Adam(lr=0.001, decay=1e-6)
model = Sequential()
model.add(Dense(5, activation="tanh", input_shape=(x_train.shape[1:])))
for i in range(40):
model.add(Dense(60, activation="relu"))
model.add(Dense(1))
I'm not too worried about overfitting my data right now as I can work on that later, I'm just trying to basically memorize the data to see how low I can get the loss, just to see if I can improve it. And predict on data that was trained to make sure it can return the correct output. However the lowest I've gotten the validation loss is around 0.015, which definitely isn't returning the current output in my tests, it's about 90% accurate.
Is there anything I'm doing wrong? Should I be increasing my model size, or decreasing it? Nothing I've tried seems to be working. I've also made sure to normalize my 5 inputs and 1 output independently. It seems it never learns anything after a few epochs.
Thanks if anyone decides to help me on this very specific problem.

I'd like to help but there's not too much info there.
First what's the aim of the network? It's hard to tell what you're trying to reduce the loss of. What's your loss function? What are your labels? This seems like a classic reinforcement learning question, rather than a traditional supervised learning problem. How's the data structured? I'm assuming they're "runs" with some sort of score?
A blanket thing, that's probably too many layers.
Most of the time with networks the biggest improvement comes with cleaning your data (which we'll assume is fine) and messing with your loss function. It's best to start with a simple model and figure out if you need more.
There's also this, which is reinforcement learning, but it's a good example https://github.com/lexfridman/deeptraffic

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.