I am trying to train a RNN Network for stock price prediction for my Master Thesis. I have additional input values (6), not just the stock prices by itself.
Using an LSTM Network with the "optimal" structure based on Hyperparameter Tuning with Keras Tuner, i observed a significant increase in the losses for training and validation in my case after 4000 Epochs.
My dataset consists of about 12 000 datapoints and i use the Adam optimizer with mean_absolute_error loss function
The Network is quite deep with the following layers:
LSTM (24 unit)
Dropout
LSTM (366 unit)
Dropout
LSTM (150 unit)
Dropout
Dense (1 unit)
I attached a graph of the loss (sorry for the german)
I would really like to understand what leads to this (for me) unexpected behaviour.
Your learning rate (often called alpha) is probably too large. Try to factor the learning rate down and try again.
it may be because of overfitting to the training data. if your setup for training uses a validation set for computing the accuracy, after a number of epochs the model becomes overfit on the training data which means it does very well on the training data but it won't generalize well on verification and test data. to solve this issue you could use regularization techniques or the easier approach is to just stop training when the accuracy starts to drop.
Related
Today I was working on a classifier to detect whether or not a mushroom was poisonous given its features. The data was in a .csv file(read to a pandas DataFrame) and the link to the data can be found at the end.
I used sci-kit learn's train_test_split function to split the data into training and testing sets.
I then removed the column that specified whether or not the mushroom was poisonous or not for the training and testing labels and assigned this to a yTrain, and yTest variable.
I then applied a one-hot-encoding (Using pd.get_dummies()) to the data since the parameters were categorical.
After this, I normalized the training and testing input data.
Essentially the training and testing input data was a distinct list of one-hot-encoded parameters and the output data was a list of one's and zeroes representing the output(one meant poisonous, zero meant edible).
I used Keras and a simple-feed forward network for this project. This network is comprised of three layers; A simple Dense(Linear Layer for PyTorch users) layer with 300 neurons, a Dense layer with 100 neurons, and a Dense layer with two neurons, each representing the probability of whether or not the given parameters of the mushroom signified it was poisonous, or edible. Adam was the optimizer that I had used, and Sparse-Categorical-Crossentropy was my loss-function.
I trained my network for 60 epochs. After about 5 epochs the loss was basically zero, and my accuracy was 1. After training, I was worried that my network had overfitted, so I tried it on my distinct testing data. The results were the same as the training and validation data; the accuracy was at 100% and my loss was negligible.
My validation loss at the end of 50 epochs is 2.258996e-07, and my training loss is 1.998715e-07. My testing loss was 4.732502e-09. I am really confused at the state of this, is the loss supposed to be this low? I don't think I am overfitting, and my validation loss is only a bit higher than my training loss, so I don't think that I am underfitting, as well.
Do any of you know the answer to this question? I am sorry if I had messed up in a silly way of some sort.
Link to dataset: https://www.kaggle.com/uciml/mushroom-classification
It seems that that Kaggle dataset is solvable, in the sense that you can create a model which gives the correct answer 100% of the time (if these results are to be believed). If you look at those results, you can see that the author was actually able to find models which give 100% accuracy using several methods, including decisions trees.
I am using a CNN in TensorFlow with two convolution layers, a single fully-connected layer and a linear layer to predict object sizes. The labels are sizes and the features are images.
To assess the performance of the network, I am using five-fold cross validation. Using TensorBoard I plot the accuracy for both the training set and the cross-validation set.
Both accuracies increase, but the cross-validation accuracy increases more slowly. Thinking the divergence in accuracies is due to the model overfitting, I tried to regularize the weights using L2 regularization. But, this just reduced the training accuracy, while the trend in cross-validation accuracy remained the same. The cross-validation accuracy always remains below 50%.
Can anyone recommend a few methods I might consider to improve the cross-validation accuracy and hence the predictive power of the model? Thank you very much.
without regularization Training accuracy is in gray, cross-validation accuracy is in green.
with regularization Training accuracy is in blue, cross-validation accuracy is in red.
Overfitting has multiple remedies. To name a few:
Regularization: instead of L2 regularization, you can try adding Dropout layers and see how the model fares. Dropout layers deactivate certain neurons during training, forcing the model to rely on the other ones as well.
Data augmentation: there are multiple techniques to augment your training data. You can either generate new images with image processing techniques or make your existing images more "CNN-friendly". Some keywords to search for are data centering and normalization/standardization, zca whitening, traditional image processing such as zoom/crop, invert, color filters, shift/skew, distort and rotate functions as well as NN-based data augmentation techniques.
Model architecture: changing your model architecture will result higher (overfitting) or lower (underfitting) loss of generality. Experiment with the number of layers, size of convolutional kernel and consider using pre-trained networks (transfer learning) such as Inception v3, AlexNet, GoogLeNet, VGG-16, etc.
There are certainly a million other ways but this is a great place to start.
I am using a CNN network to classify images into 5 classes. The size of my dataset is around 370K. I am using Adam optimizer with learning rate 0.0001 and batch size of 32. Surprisingly, I am getting improvement in validation accuracy over the epochs but validation loss is constantly growing.
I am assuming that the model is becoming less and less unsure about the validation set but the accuracy is more because the value of softmax output is more than the threshold value.
What can be the reason behind this? Any help in this regard would be highly appreciated.
I think this is a case of overfitting, as previous comments pointed out. Overfitting can be the result of high variance in the dataset. When you trained the CNN it showed a good ratio towards the decreasing of training error, producing a more complex model. More complex models produce overfitting and it can be noted when validation error tends to increase.
Adam optimizer is taking care of the learning rate, exponential decay and in general of the optimization of the model, but it won't take any action against overfitting. If you want to reduce it (overfitting), you will need to add a regularization technique which will penalize large values of the weights in the model.
You can read more details about this in the deep learning book: http://www.deeplearningbook.org/contents/regularization.html
I'm currently trying to test the differences of behavior between a LSTM RNN and a GRU RNN on the prediction of time series ( classification 1/0 if time series goes up or down). I use the fit_generator method (as in Keras François Chollet book's)
I feed 30 point to the network and the next point has to be classified as up or down. The training set samples are reshuffled while the validation samples are not of course.
If I do not change the default value of the Learning rate ( 10-3 for adam algorithm) then the usual happens, the training set tends to be overfitted after a certain number of epochs, both for LSTM and GRU cells.
LSTM 1 layer 128N
Note that my plots are averaged with 10 simulations ( this way I get rid of the particular weight random initialization )
If I choose lower learning rates, the training and validation accuracy see below the impact of a different learning rate, it appears to me like the training set is not able to overfit anymore (???)
learning rate impact on GRU network
When I compare the LSTM and the GRU, it's even worse, the validation set gets higher accuracy than the training set in the LSTM case. For the GRU case, the curve are close, but the training set is still higher
2 layer LSTM LR=1e-5
2 layer GRU LR=1e-5
Note that this is less emphasized for 1 layer LSTM
1 layer LSTM LR=1e-5
I have tested this with 1 or 2 layers and for a different validation set but results are similar.
extract of my code below:
modelLSTM_2a = Sequential()
modelLSTM_2a.add(LSTM(units=32, input_shape=(None, data.shape[-1]),return_sequences=True))
modelLSTM_2a.add(LSTM(units=32, input_shape=(None, data.shape[-1]),return_sequences=False))
modelLSTM_2a.add(Dense(2))
modelLSTM_2a.add(Activation('softmax'))
adam = keras.optimizers.Adam(lr=1e-5, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
modelLSTM_2a.compile(optimizer= adam, loss='categorical_crossentropy', metrics=['accuracy'])
Would someone have a clue of what may happening?
I'm really puzzled by this behavior, especially by the impact of the learning rate in the LSTM case
I have trained an image classifier CNN for 50 epochs and achieved 65% accuracy with 64% accuracy on the validation data.
My problem is when using model.predict on a single sample (one image) the network behaves as though it is untrained.
I fed model.predict thousands of images, one at a time and the average classification accuracy was only 46%.
I tried using both model.save_model and saving the json model and weights separately, but there was no difference.
My only thought as to why this is occurring is to do with the BatchNorm layers in my model effecting the consistency of the data.
My model contains six CNN layers, with three max pooling and one final fully connected layer, with a BatchNormalisation layer between every layer (seven in total). I used a batch size of 128 during training, but of course the batch size for the prediction samples is 1. I don't know much about BatchNorm, but I wonder if there is some kind of normalisation that is happening on the training and testing data but not on the predictions?