sklearn svm SVC fails, but does not report fit_status_==1 - python
I am trying to fit a weighted linear SVC to the "noisy circles" dataset. For some reason, the weighted version finds a decision function that is very very very bad. Yet, libsvm reports that the fit was successful. My weights are not totally strange, so I'm not sure why the algorithm fails. Worse, I'm not sure how to predict under what circumstances the algorithm will fail, or what to do about it.
Here is the offensive code
import numpy as np
import sklearn.datasets
import sklearn.svm
n = 200
factor = 0.3
SEED = 1
noisy_circles, c = sklearn.datasets.make_circles(n_samples=n, factor=factor,
weights = np.array([0.93301464, 0.92261151, 0.93367401, 0.38632274, 0.35437395,
0.43346701, 1.09297683, 1.19747184, 0.96349809, 0.32426173,
0.29397037, 1.03628304, 1.05908521, 1.10653401, 0.37677232,
0.35153446, 0.24747971, 0.90887151, 0.24463193, 0.85877582,
0.89405636, 1.03921294, 0.87729103, 1.1589434 , 0.93196245,
0.22982046, 0.82391095, 0.95794411, 0.39876209, 0.96383222,
0.91290011, 0.24322639, 0.41364025, 0.32605574, 0.3712862 ,
1.13075687, 0.33799184, 0.94422961, 0.96021123, 0.29392899,
0.40880845, 0.37780868, 0.4861022 , 1.06077845, 0.89866461,
1.07030338, 0.34269111, 0.86699042, 0.39481626, 0.33021158,
1.17056528, 0.24180542, 0.2446189 , 0.87293221, 0.91510412,
0.32998597, 0.37407169, 0.41486528, 0.42505555, 0.20065111,
0.38846804, 0.92251402, 0.99049091, 0.90580681, 0.97491595,
1.08819797, 0.26700098, 0.42487132, 0.93167479, 1.02463133,
0.89980578, 1.1096191 , 0.37254448, 0.2359968 , 0.28334117,
0.33311215, 1.08758973, 0.32901317, 1.13315268, 0.29888742,
0.14581565, 1.07038078, 1.03316864, 0.35451779, 0.45098287,
1.12772454, 1.08896868, 0.28236812, 0.46117373, 0.83258909,
1.174982 , 0.89901124, 0.12965322, 0.41543288, 0.17358532,
0.45842307, 0.42685333, 0.42375945, 0.210712 , 0.377017 ,
1.03517938, 0.9891231 , 1.07126936, 0.19820075, 1.1002386 ,
0.93338903, 1.1061464 , 0.20301447, 1.08130118, 0.34030289,
1.16104716, 0.15868522, 1.07481773, 0.94876721, 0.93468891,
0.3231601 , 1.04994012, 0.32166893, 0.90920628, 0.90999114,
1.03839278, 1.14232502, 0.18056755, 0.2639544 , 0.16631772,
1.10689008, 0.36852231, 0.20091628, 0.28666013, 1.05392917,
0.91207713, 1.13049957, 0.40367044, 0.33333911, 0.3380625 ,
1.0615807 , 0.30797683, 1.08206638, 0.39374589, 0.40647774,
0.23565583, 0.22030266, 0.33806818, 0.44739648, 0.94079254,
1.03878309, 0.84132066, 0.2772951 , 0.40448219, 1.14960352,
0.89091529, 0.97398981, 1.00992373, 0.87505294, 0.98439767,
1.13634672, 0.2694606 , 0.89735526, 0.21407159, 0.31951442,
0.37647624, 0.90387395, 0.36897273, 0.32483939, 0.42423936,
1.14167808, 0.88631001, 0.34304598, 1.12320881, 0.91640671,
1.0111603 , 0.8649317 , 0.97180267, 1.17381377, 0.4581278 ,
0.15286761, 1.14522941, 1.17181889, 1.02299728, 0.91620512,
0.18773065, 0.2600077 , 0.23665254, 0.20477831, 0.16430318,
0.38680433, 1.0352136 , 0.31850732, 1.02505276, 0.24500125,
1.01564276, 0.20866012, 0.2194238 , 0.37527691, 1.05327402,
0.18154061, 0.25013442, 0.99024356, 0.15072547, 0.87641354])
model = sklearn.svm.SVC(C=30.,kernel="linear"), c, sample_weight=weights)
print(model.coef_, model.intercept_, model.fit_status_)
Note that the fit_status reports success. However, the fitted model parameters are total nonsense. To see this, here is the plot of the data (with size of dot scaled as the weight of the point):
Here is the fitted line along the same range in x:
Whatever is happening here seems to be driving the decision surface off to infinity. At first I thought that it was my having such a large C that was simply overpowering the part of the SVM that was trying to learn anything, but reducing C to 0.0001 does not change anything.
What is going on with the algorithm that produces this counter-intuitive behavior? Under what circumstances should I expect the algorithm to fail in this way?
UPDATE: The nightly build of sklearn supports sample weights for LinearSVC. Switching over to LinearSVC, I am witnessing the same behavior when the loss is set to "hinge", but not for this particular set of weights. This causes me to suspect that there is some kind of ill-conditioning in the problem somewhere. I'm still not sure exactly what is happening, but possibly this sheds some light on the problem.
The problem doesn't lie in sample_weight or C, it lies in the linear nature of the kernel. You are trying to learn a non-linear decision boundary (circular in this case) using a function that simply can not express anything but linear decision boundary. This applies to both SVC(kernel="linear") and LinearSVC. In my experiments, simply using a non-linear kernel like rbf completely solved it.
All SVMs in fact learn a linear boundary. So why something like rbf performs well? The answer lies in something called "kernel trick". Put simply, rbf transforms the dataset in some ways (projection to higher dimensional space is the technical way to put it), so that linearly separating the classes in that transformed space actually results in non-linear boundary in our original space. Here is a more detailed explanation for it.
Update: As for how weights contribute to the failure for linear kernel, the answer most likely lies in the fact that the avg weight assigned to the classes is imbalanced. In particular, the fact that the avg weight assigned to class-0 is 3 times higher than 1. Here are few results that point to this conclusion:
The Linear kernel learns "reasonable" boundary (meaning boundary is within the input space of samples) when weights are all 1.0 or randomly generated.
Also reasonable if we balance the weights of the classes using class_weight={0:(w[y==1].sum()/w[y==0].sum()),1:1} formula in constructor.
If we create weight imbalance in some other ways: like using uniform weights but with different class weights, or assigning class-1 3 times higher than 0, or if we remove weights altogether and simply make frequency of one class 1/3rd of other, the above problem reappears.
This imbalance seems to push co-efficients to zero, although rbf kernel doesn't seem to be affected by it. As for why libsvm can't report the failure, that unfortunately I do not know.
