Just wanted to check. I have this plot of lasso on training dataset and cross-validation dataset. The red curve is training error while black is cross-validation error.
This curve looks fine right? With increasing alpha, training error would increase while cross-validation error would go down.
Is this understanding correct?
From the plot itself, your understanding is correct. It looks like after the vertical line indicated alpha value, the validation error also started to increase. That may be caused by under-fitting with too much bias from the model.
It is hard to tell whether this plot itself is correct without knowing the data though.
Related
I trained a model using CNN,
The results are,
Accuracy
Loss
I read from fast.ai, the experts say, a good model has the val_loss is slightly greater than the loss.
My model is different in points, So, Can I take this model as good or I need to train again...
Thank you
it seems that maybe your validation set is "too easy" for the CNN you trained or doesn't represent your problem enough. it's difficult to say witch the amount of information you provided.
I would say, your validation set is not properly chosen. or you could try using crossvalidation to get more insights.
First of all as already pointed out I'd check your data setup: How many data points do the training and the test (validation) data sets contain? And have they been properly chosen?
Additionally, this is an indicator that your model might be underfitting the training data. Generally, you are looking for the "sweet spot" regarding the trade off between predicting your training data well and not overfitting (i.e. doing bad on the test data):
If accuracy on the training data is not high then you might be underfitting (left hand side of the green dashed line). And doing worse on the training data than on the test data is an indication for that (although this case is not being shown in the plot). Therefore, I'd check what happens to your accuracy if you increase the model complexity and fit it better to the training data (moving towards the 'sweet spot' in the illustrative graph).
In contrast if your test data accuracy was low you might be in an overfitting situation (right hand side the green dashed line).
Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?
A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.
I'm trying to understand how Perceptron from sklearn.linear_model performs fit() function (Documentation). Question comes from this piece of code:
clf = Perceptron()
clf.fit(train_data, train_answers)
print('accuracy:', clf.score(train_data, train_answers))
accuracy: 0.7
I thought goal of fitting is to create classification function which will give answer with 100% accuracy on test data, but in the example above it gives only 70%. I have tried one more data set where accuracy was 60%.
What do I misunderstand in fitting process?
It depends on your training data pattern distribution. In the graph shown below, could you find a straight line to separate blue and red? Obviously not, and this is the point. Your data must be linearly separable for
Perceptron Learning Algorithm to achieve 100% accuracy on training data. Otherwise, no straight line can separate it perfectly.
Hi here is the code image for prediction value ,but i cannot understand about the result... using SVM technique...
and here is the result for my code
and here is the data..
i cannot understand the result of my code.... please guys will you help me out ...
The result is pretty trivial. You are looking at accuracy on your training set, and you are using SVM with rbf kernel, which can overfit to any dataset, and it does so perfectly - it simply remembers all your training points, thus produces perfect predictions on training points. This is not the way you are supposed to do evaluation, you need train-test splits (for problem of that size - many of them, probably strenghtened by cross-validation).
I have an rbf SVM that I'm tuning with gridsearchcv. How do I tell if my good results are actually good results or whether they are overfitting?
Overfitting is generally associated with high variance, meaning that the model parameters that would result from being fitted to some realized data set have a high variance from data set to data set. You collected some data, fit some model, got some parameters ... you do it again and get new data and now your parameters are totally different.
One consequence of this is that in the presence of overfitting, usually the training error (the error from re-running the model directly on the data used to train it) will be very low, or at least low in contrast to the test error (running the model on some previously unused test data).
One diagnostic that is suggested by Andrew Ng is to separate some of your data into a testing set. Ideally this should have been done from the very beginning, so that happening to see the model fit results inclusive of this data would never have the chance to impact your decision. But you can also do it after the fact as long as you explain so in your model discussion.
With the test data, you want to compute the same error or loss score that you compute on the training data. If training error is very low, but testing error is unacceptably high, you probably have overfitting.
Further, you can vary the size of your test data and generate a diagnostic graph. Let's say that you randomly sample 5% of your data, then 10%, then 15% ... on up to 30%. This will give you six different data points showing the resulting training error and testing error.
As you increase the training set size (decrease testing set size), the shape of the two curves can give some insight.
The test error will be decreasing and the training error will be increasing. The two curves should flatten out and converge with some gap between them.
If that gap is large, you are likely dealing with overfitting, and it suggests to use a large training set and to try to collect more data if possible.
If the gap is small, or if the training error itself is already too large, it suggests model bias is the problem, and you should consider a different model class all together.
Note that in the above setting, you can also substitute a k-fold cross validation for the test set approach. Then, to generate a similar diagnostic curve, you should vary the number of folds (hence varying the size of the test sets). For a given value of k, then for each subset used for testing, the other (k-1) subsets are used for training error, and averaged over each way of assigning the folds. This gives you both a training error and testing error metric for a given choice of k. As k becomes larger, the training set sizes becomes bigger (for example, if k=10, then training errors are reported on 90% of the data) so again you can see how the scores vary as a function of training set size.
The downside is that CV scores are already expensive to compute, and repeated CV for many different values of k makes it even worse.
One other cause of overfitting can be too large of a feature space. In that case, you can try to look at importance scores of each of your features. If you prune out some of the least important features and then re-do the above overfitting diagnostic and observe improvement, it's also some evidence that the problem is overfitting and you may want to use a simpler set of features or a different model class.
On the other hand, if you still have high bias, it suggests the opposite: your model doesn't have enough feature space to adequately account for the variability of the data, so instead you may want to augment the model with even more features.