I have a regression model. I write code of this algorithm :
create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}.
To choose the best alpha hyperparameter value, you have to do the following:
• For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.
• For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.
• On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) w.r.t. each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.
• Print the best value of alpha hyperparameter.
2- Evaluate the prediction performance on test data and report the following:
• Total number of non-zero features in the final model.
• The confusion matrix
• Precision, recall and accuracy for each class.
Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning
I write This code :
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(Newclassifier.score(X_test, y_test)))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
My Questions is :
1- why accuracy in each iteration decrease?
2- is My model Overfit or underfit?
3- does My model work right?
There is no official/absolute metric for deciding whether you are underfitting, overfitting of neither. In practice
underfitting: you model is too simple. There will be no much difference between train and validation set, but the accuracy will be pretty low on them
overfitting: you model is too complicated. Instead of learning the underlying patterns, it memorizes you training set. So, the training error will decrease, but the validation error will start increasing after some point
In you case, your training and testing error seem to go in parallel, so you don't seem to have a problem with overfitting. Your model could be underfitting, so you could try with a more complex model. However, it is possible that this is how good this algorithm can get at this particular training set. In most real problems, no algorithm can get to zero error.
As to why your error increases, I don't know how this particular algorithm works, but since it seems to rely on random methods, it seems reasonable behavior. It goes a bit up and down, but it does not steadily increase, so it doesn't seem problematic.
Related
I have a training data with 3961 different rows and 32 columns I want to fit to a Random Forest and a Gradient Boosting model. While training, I need to fine-tune the hyper-parameters of the models to get the best AUC possible. To do so, I minimize the quantity 1-AUC(Y_real,Y_pred) using the Basin-Hopping algorithm described in Scipy; so my training and internal validation subsamples are the same.
When the optimization is finished, I get for Random Forest an AUC=0.994, while for the Gradient Boosting I get AUC=1. Am I overfitting these models? How could I know when an overfitting is taking place during training?
To know if your are overfitting you have to compute:
Training set accuracy (or 1-AUC in your case)
Test set accuracy (or 1-AUC in your case)(You can use validation data set if you have it)
Once you have calculated this scores, compare it. If training set score is much better than your test set score, then you are overfitting. This means that your model is "memorizing" your data, instead of learning from it to make future predictions.
To know if you are overfitting, you always need to do this process. However, if your training accuracy or score is too perfect (e.g. accuracy of 100%), you can sense that you are overfitting too.
So, if you don't have training and test data, you have to create it using sklearn.model_selection.train_test_split. Then you will be able to compare both accuracy. Otherwise, you won't be able to know, with confidence, if you are overfitting or not.
I am trying to fit a really small dataset which has no of training example=16 and when I used decision tree regression although it gives exact values in training but fails in test set. I cant figure out why it is happening
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(max_depth=50,max_features="auto")
model.fit(X, Y)
I guess that by "it fails in test set" you actually mean that you get a low testing accuracy.
This is a perfect example of an overfitted model. By definition, overfitting occurs when the training accuracy (in your case is 100%) is greater than the training/validation accuracy. This means that your model has learned patterns in the training data which are not applicable or valid in the wider population.
There are many techniques that can help you tackle overfitting. It might be worth starting with K-Fold Validation.
I have trained an LSTM model for Time Series Forecasting. I have used an early stopping method with a patience of 150 epochs.
I have used a dropout of 0.2, and this is the plot of train and validation loss:
The early stopping method stop the training after 650 epochs, and save the best weight around epoch 460 where the validation loss was the best.
My question is :
Is it normal that the train loss is always above the validation loss?
I know that if it was the opposite(validation loss above the train) it would have been a sign of overfitting.
But what about this case?
EDIT:
My dataset is a Time Series with hourly temporal frequence. It is composed of 35000 instance. I have split the data into 80 % train and 20% validation but in temporal order. So for example the training will contain the data until the beginning of 2017 and the validation the data from 2017 until the end.
I have created this plot by averaging the data over 15 days and this is the result:
So maybe the reason is as you said that the validation data have an easier pattern. How can i solve this problem?
For most cases, the validation loss should be higher than the training loss because the labels in the training set are accessible to the model. In fact, one good habit to train a new network is to use a small subset of the data and see whether the training loss can converge to 0 (fully overfits the training set). If not, it means this model is somehow incompetent to memorize the data.
Let's go back to your problem. I think the observation that validation loss is less than training loss happens. But this possibly is not because of your model, but how you split the data. Consider that there are two types of patterns (A and B) in the dataset. If you split in a way that the training set contains both pattern A and pattern B, while the small validation set only contains pattern B. In this case, if B is easier to be recognized, then you might get a higher training loss.
In a more extreme example, pattern A is almost impossible to recognize but there are only 1% of them in the dataset. And the model can recognize all pattern B. If the validation set happens to have only pattern B, then the validation loss will be smaller.
As alex mentioned, using K-fold is a good solution to make sure every sample will be used as both validation and training data. Also, printing out the confusion matrix to make sure all labels are relatively balanced is another method to try.
Usually the opposite is true. But since you are using drop out,it is common to have the validation loss less than the training loss.And like others have suggested try k-fold cross validation
My model which I have trained on a set of 29K images for 36 classes and validated on 7K images. The model has a training accuracy of 94.59% and validation accuracy of 95.72%
It has been created for OCR on digits and characters. I know the amount of images for training on 36 classes might not be sufficient. I'm not certain what to infer from these results.
Question: Is this a good result? Should the testing accuracy always be greater than training accuracy? Is my model overfitting?
Question: How would I know if my model was overfitting? I'm assuming a very high training accuracy and very low testing accuracy would indicate that?
95% is rather good for 36 classes. If your validation accuracy is higher than training accuracy, you are underfitting. You can run some more epochs, until your training accuracy is a bit higher than validation accuracy.
Exactly, if training accuracy is much higher, you are overfitting.
The training accuracy should always be higher than the testing accuracy/ validation accuracy. This is because your model has to be good on the data that it is provided to be able to predict unknown datas. However, it also happens sometimes and the reason could be
a. The test test wasn't randomly selected or was randomly selected but turned out to be
favourable one (a coincidence).
b. Your Model is very generalized and in combination with the first problem.
Check the learning curve first, Your case is rare in which the training accuracy is smaller. Solution may be more data or More complex models or more epochs (Solution to underfitting)
The code in this tensorflow tutorial uses this section of the code to calculate the validation accuracy right?
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
y=eval_labels,
num_epochs=1,
shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)
Question: So if I had to calculate the training set accuracy that is to see if my model is overfitting my training set data, if I changed the value of "x" to train_data and feed the training data for testing as well, Would it give me the training set accuracy?
If not, how do I check if my model is overfitting my dataset?
How does the number of steps affect the accuracy?
Like if I have trained it for 20000 steps and then if I train it for another 100. Why does it change the accuracy? Is it since the weights are being calculated all over again? Would it be advisable to do something like this then?
mnist_classifier.train(
input_fn=train_input_fn,
steps=20000,
hooks=[logging_hook])
Normally you have 3 datasets, 1 for training, 1 for validation and 1 for testing. All these datasets have to be unique, an image of the training set may not occur in the validation or test set, etc. You train with the training set and after each epoch, you validate the model with the validation data. The optimizers will always try to update the weights to perfectly classify the training data, the training accuracy will therefore get very high (>90). The validation data is data the model has never seen before, and its done after each epoch (or x amount of steps) to show how well the model reacts to data is hasn't seen before, this shows how well the model will improve overtime.
The more you train, the higher the training accuracy will become, since the optimizer will do its best to get that value to 100%. The validation data, that does not update the weights, also increases overtime, but not continuously. While the training accuracy keeps improving, the validation accuracy might stop improving. The moment the validation accuracy decreases over time, well then you're overfitting. This means that the model is focusing too much on the training data, and that it can't classify another character correctly if it differs from the training set.
At the end of all the training you use a test set, this will determine the actual accuracy of your model on new data.
#xmacz: I cannot add comments yet, only answers so I just update my answer. Yes, I checked the source code, your first lines of code tests the model on test data
The evaluate is just a function which does some numerical activities to the input data and produces some output. If you use it for testing data it should give the testing accuracy and if you input the training data it should output the training accuracy.
At the end of the day it is just mathematics. What the output is intuitively, is something that you would have to ascertain.
How to know whether your model is overfitting is something you do while training the model. You have to set apart another set called validation dataset which is different from the test and training sets. A typical split of datasets is 70%-20%-10% for training, testing and validating respectively.
During the training, every n steps you test your model on the validation dataset. During the first iterations the score on your validation set will get better but at some point it will get worse. You can use this information to stop your training when your model starts to overfit but doing it right is an art. You could for instance stop after 5 tests that your accuracy has been decreasing consecutively, because sometimes you can see that it gets worse but in the next test it gets better. It's hard to say, it depends on many factors.
Regarding to your second question, iterating another 100 steps could make your model better or worse, depending on whether it's overfitting or not, so I'm afraid that question doesn't have a clear answer. The weights will rarely stop changing because the iterations/steps are "moving" them, for good or for bad. Again, it's difficult to say how to get good results, but you could try early stopping using a validation set, as I've mentioned before.