I have trained an image classifier CNN for 50 epochs and achieved 65% accuracy with 64% accuracy on the validation data.
My problem is when using model.predict on a single sample (one image) the network behaves as though it is untrained.
I fed model.predict thousands of images, one at a time and the average classification accuracy was only 46%.
I tried using both model.save_model and saving the json model and weights separately, but there was no difference.
My only thought as to why this is occurring is to do with the BatchNorm layers in my model effecting the consistency of the data.
My model contains six CNN layers, with three max pooling and one final fully connected layer, with a BatchNormalisation layer between every layer (seven in total). I used a batch size of 128 during training, but of course the batch size for the prediction samples is 1. I don't know much about BatchNorm, but I wonder if there is some kind of normalisation that is happening on the training and testing data but not on the predictions?
Related
I am trying to train a RNN Network for stock price prediction for my Master Thesis. I have additional input values (6), not just the stock prices by itself.
Using an LSTM Network with the "optimal" structure based on Hyperparameter Tuning with Keras Tuner, i observed a significant increase in the losses for training and validation in my case after 4000 Epochs.
My dataset consists of about 12 000 datapoints and i use the Adam optimizer with mean_absolute_error loss function
The Network is quite deep with the following layers:
LSTM (24 unit)
Dropout
LSTM (366 unit)
Dropout
LSTM (150 unit)
Dropout
Dense (1 unit)
I attached a graph of the loss (sorry for the german)
I would really like to understand what leads to this (for me) unexpected behaviour.
Your learning rate (often called alpha) is probably too large. Try to factor the learning rate down and try again.
it may be because of overfitting to the training data. if your setup for training uses a validation set for computing the accuracy, after a number of epochs the model becomes overfit on the training data which means it does very well on the training data but it won't generalize well on verification and test data. to solve this issue you could use regularization techniques or the easier approach is to just stop training when the accuracy starts to drop.
Today I was working on a classifier to detect whether or not a mushroom was poisonous given its features. The data was in a .csv file(read to a pandas DataFrame) and the link to the data can be found at the end.
I used sci-kit learn's train_test_split function to split the data into training and testing sets.
I then removed the column that specified whether or not the mushroom was poisonous or not for the training and testing labels and assigned this to a yTrain, and yTest variable.
I then applied a one-hot-encoding (Using pd.get_dummies()) to the data since the parameters were categorical.
After this, I normalized the training and testing input data.
Essentially the training and testing input data was a distinct list of one-hot-encoded parameters and the output data was a list of one's and zeroes representing the output(one meant poisonous, zero meant edible).
I used Keras and a simple-feed forward network for this project. This network is comprised of three layers; A simple Dense(Linear Layer for PyTorch users) layer with 300 neurons, a Dense layer with 100 neurons, and a Dense layer with two neurons, each representing the probability of whether or not the given parameters of the mushroom signified it was poisonous, or edible. Adam was the optimizer that I had used, and Sparse-Categorical-Crossentropy was my loss-function.
I trained my network for 60 epochs. After about 5 epochs the loss was basically zero, and my accuracy was 1. After training, I was worried that my network had overfitted, so I tried it on my distinct testing data. The results were the same as the training and validation data; the accuracy was at 100% and my loss was negligible.
My validation loss at the end of 50 epochs is 2.258996e-07, and my training loss is 1.998715e-07. My testing loss was 4.732502e-09. I am really confused at the state of this, is the loss supposed to be this low? I don't think I am overfitting, and my validation loss is only a bit higher than my training loss, so I don't think that I am underfitting, as well.
Do any of you know the answer to this question? I am sorry if I had messed up in a silly way of some sort.
Link to dataset: https://www.kaggle.com/uciml/mushroom-classification
It seems that that Kaggle dataset is solvable, in the sense that you can create a model which gives the correct answer 100% of the time (if these results are to be believed). If you look at those results, you can see that the author was actually able to find models which give 100% accuracy using several methods, including decisions trees.
Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.
In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.
Experiment 1.
I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.
Experiment 2
I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.
Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.
Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.
Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?
If you follow this tutorial here, you will find the results of the two experiments as given below -
Experiment 1
Experiment 2
In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.
If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.
On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").
TL;DR
If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.
If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.
Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off.
See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.
Specifically:
It is important to also save the optimizer’s state_dict, as this
contains buffers and parameters that are updated as the model trains.
I have a dataset of only features extracted from a CNN (array of 4096), concatenated with another array of 512 of Gist data of images (total of 4602 features per sample). The features extracted of 2D images (scenes) I don't have access to the images, only their features. The goal is to train a classifier that can tell if a scene is memorable or not (So 2 categories).
The problem is:
No matter what model architecture (Using Dense and Dropouts with all sorts of parameters) I use, my model always ends up biased to only one category (classifies all 0 or all 1). The accuracy stays in 70%, while the loss gets below 1.0
Things I tried:
Different parameters for the layers
Different number of hidden layers
different loss functions (binary_crossentropy, sparse_categorical_crossentropy, mean_squared_error...)
Different optimizers with different learning rates ranging from 0.0001 to 0.01 (Adam, SGD)
I tried training on the first 4096 CNN features only as well as the 512 Gist features only. and both concatenated.
Notes:
The features are extracted from the CaffeNet last Conv layer.
This model is going to be used on features extracted with the same CaffeNet layer.
I am using a CNN in TensorFlow with two convolution layers, a single fully-connected layer and a linear layer to predict object sizes. The labels are sizes and the features are images.
To assess the performance of the network, I am using five-fold cross validation. Using TensorBoard I plot the accuracy for both the training set and the cross-validation set.
Both accuracies increase, but the cross-validation accuracy increases more slowly. Thinking the divergence in accuracies is due to the model overfitting, I tried to regularize the weights using L2 regularization. But, this just reduced the training accuracy, while the trend in cross-validation accuracy remained the same. The cross-validation accuracy always remains below 50%.
Can anyone recommend a few methods I might consider to improve the cross-validation accuracy and hence the predictive power of the model? Thank you very much.
without regularization Training accuracy is in gray, cross-validation accuracy is in green.
with regularization Training accuracy is in blue, cross-validation accuracy is in red.
Overfitting has multiple remedies. To name a few:
Regularization: instead of L2 regularization, you can try adding Dropout layers and see how the model fares. Dropout layers deactivate certain neurons during training, forcing the model to rely on the other ones as well.
Data augmentation: there are multiple techniques to augment your training data. You can either generate new images with image processing techniques or make your existing images more "CNN-friendly". Some keywords to search for are data centering and normalization/standardization, zca whitening, traditional image processing such as zoom/crop, invert, color filters, shift/skew, distort and rotate functions as well as NN-based data augmentation techniques.
Model architecture: changing your model architecture will result higher (overfitting) or lower (underfitting) loss of generality. Experiment with the number of layers, size of convolutional kernel and consider using pre-trained networks (transfer learning) such as Inception v3, AlexNet, GoogLeNet, VGG-16, etc.
There are certainly a million other ways but this is a great place to start.