Result from Support Vector Machine confusing - python

Hi here is the code image for prediction value ,but i cannot understand about the result... using SVM technique...
and here is the result for my code
and here is the data..
i cannot understand the result of my code.... please guys will you help me out ...

The result is pretty trivial. You are looking at accuracy on your training set, and you are using SVM with rbf kernel, which can overfit to any dataset, and it does so perfectly - it simply remembers all your training points, thus produces perfect predictions on training points. This is not the way you are supposed to do evaluation, you need train-test splits (for problem of that size - many of them, probably strenghtened by cross-validation).

Related

Are those keras loss and accuracy weird?

I have a relatively small mri dataset and I'm trying to do a binary segmentation. I have built an ordinary U-Net structure and trained it.
But the output seems a bit weird to me. Both train and validation accuracies stucked at a value first, but then both accuracies made a sudden big jump at 27th or 28th epoch.
Loss graph looks more acceptable, next is the graphs:
Accuracy Graph:
Loss Graph:
I have another issue that even if I have an %97-98 accuracy on training data, when I tested it on some images from training data, results converted to binary mask were not that good.
Then I have decreased the threshold from 0.5 to 0.35 while retrieving output images and the results were almost perfect.
What do you think about that? thanks in advance.
They seem a little off with those stuck epochs, it really means the model isn't learning (weights are not changing, new cases are not providing useful information) but that is totally plausible.
Just to be sure. What optimizer are you using? and did you try with another one?

Why does more epochs make my model worse?

Most of my code is based on this article and the issue I'm asking about is evident there, but also in my own testing. It is a sequential model with LSTM layers.
Here is a plotted prediction over real data from a model that was trained with around 20 small data sets for one epoch.
Here is another plot but this time with a model trained on more data for 10 epochs.
What causes this and how can I fix it? Also that first link I sent shows the same result at the bottom - 1 epoch does great and 3500 epochs is terrible.
Furthermore, when I run a training session for the higher data count but with only 1 epoch, I get identical results to the second plot.
What could be causing this issue?
A few questions:
Is this graph for training data or validation data?
Do you consider it better because:
The graph seems cool?
You actually have a better "loss" value?
If so, was it training loss?
Or validation loss?
Cool graph
The early graph seems interesting, indeed, but take a close look at it:
I clearly see huge predicted valleys where the expected data should be a peak
Is this really better? It sounds like a random wave that is completely out of phase, meaning that a straight line would indeed represent a better loss than this.
Take a look a the "training loss", this is what can surely tell you if your model is better or not.
If this is the case and your model isn't reaching the desired output, then you should probably make a more capable model (more layers, more units, a different method, etc.). But be aware that many datasets are simply too random to be learned, no matter how good the model.
Overfitting - Training loss gets better, but validation loss gets worse
In case you actually have a better training loss. Ok, so your model is indeed getting better.
Are you plotting training data? - Then this straight line is actually better than a wave out of phase
Are you plotting validation data?
What is happening with the validation loss? Better or worse?
If your "validation" loss is getting worse, your model is overfitting. It's memorizing the training data instead of learning generally. You need a less capable model, or a lot of "dropout".
Often, there is an optimal point where the validation loss stops going down, while the training loss keeps going down. This is the point to stop training if you're overfitting. Read about the EarlyStopping callback in keras documentation.
Bad learning rate - Training loss is going up indefinitely
If your training loss is going up, then you've got a real problem there, either a bug, a badly prepared calculation somewhere if you're using custom layers, or simply a learning rate that is too big.
Reduce the learning rate (divide it by 10, or 100), create and compile a "new" model and restart training.
Another problem?
Then you need to detail your question properly.

sklearn: Naive Bayes classifier gives low accuracy

I have a dataset which includes 200000 labelled training examples.
For each training example I have 10 features, including both continuous and discrete.
I'm trying to use sklearn package of python in order to train the model and make predictions but I have some troubles (and some questions too).
First let me write the code which I have written so far:
from sklearn.naive_bayes import GaussianNB
# data contains the 200 000 examples
# targets contain the corresponding labels for each training example
gnb = GaussianNB()
gnb.fit(data, targets)
predicted = gnb.predict(data)
The problem is that I get really low accuracy (too many misclassified labels) - around 20%.
However I am not quite sure whether there is a problem with the data (e.g. more data is needed or something else) or with the code.
Is this the proper way to implement a Naive Bayes classifier given a dataset with both discrete and continuous features?
Furthermore, in Machine Learning we know that the dataset should be split into training and validation/testing sets. Is this automatically performed by sklearn or should I fit the model using the training dataset and then call predict using the validation set?
Any thoughts or suggestions will be much appreciated.
The problem is that I get really low accuracy (too many misclassified labels) - around 20%. However I am not quite sure whether there is a problem with the data (e.g. more data is needed or something else) or with the code.
This is not big error for Naive Bayes, this is extremely simple classifier and you should not expect it to be strong, more data probably won't help. Your gaussian estimators are probably already very good, simply Naive assumptions are the problem. Use stronger model. You can start with Random Forest since it is very easy to use even by non-experts in the field.
Is this the proper way to implement a Naive Bayes classifier given a dataset with both discrete and continuous features?
No, it is not, you should use different distributions in discrete features, however scikit-learn does not support that, you would have to do this manually. As said before - change your model.
Furthermore, in Machine Learning we know that the dataset should be split into training and validation/testing sets. Is this automatically performed by sklearn or should I fit the model using the training dataset and then call predict using the validation set?
Nothing is done automatically in this manner, you need to do this on your own (scikit learn has lots of tools for that - see the cross validation pacakges).

Does Sci-Kit learn's .fit(X,y) method work sequentially, if not how does it work?

I am using Sci-Kit learn's svm library for classifying images. I was wondering when I fit the testing data does it work sequentially or does it erase the previous classification material and re-fit to the new testing data. For example if I fit 100 images to the classifier can I go ahead and then sequentially fit another 100 images or will the SVM delete the work it performed on the original 100 images. This is difficult to explain for me so I'll provide and example:
In order to fit a SVM classifier to 200 images can I do this:
clf=SVC(kernel='linear')
clf.fit(test.data[0:100], test.target[0:100])
clf.fit(test.data[100:200], test.target[100:200])
Or must I do this:
clf=SVC(kernel='linear')
clf.fit(test.data[:200], test.target[:200])
I am wondering only because I run into memory errors when trying to use .fit(X, y) with too many images at once. So is it possible to use fit sequentially and "increment" my classifier upwards so that it is techincally trained on 10000 images but only 100 at a time.
If this is possible please confirm and explain? And if its not possible please explain?
http://scikit-learn.org/stable/developers/index.html#estimated-attributes
The last-mentioned attributes are expected to be overridden when you
call fit a second time without taking any previous value into account:
fit should be idempotent.
https://en.wikipedia.org/wiki/Idempotent
So yes, second call will erase old model and compute new one. You can check it by yourself if you understand python code. For example in sklearn/svm/classes.py
I think you need minibatch training, but i don't see partial_fit implementation for SVM, maybe it's because scikit-learn team recommend SGDClassifier and SGDRegressor for dataset with size more than 100k samples. http://scikit-learn.org/stable/tutorial/machine_learning_map/, try to use them with minibatch as described here.

Cross-validation and Training error of Lasso

Just wanted to check. I have this plot of lasso on training dataset and cross-validation dataset. The red curve is training error while black is cross-validation error.
This curve looks fine right? With increasing alpha, training error would increase while cross-validation error would go down.
Is this understanding correct?
From the plot itself, your understanding is correct. It looks like after the vertical line indicated alpha value, the validation error also started to increase. That may be caused by under-fitting with too much bias from the model.
It is hard to tell whether this plot itself is correct without knowing the data though.

Categories

Resources