Error in plotting the decision boundary for SVC Laplace kernel - python

I'm trying to plot the decision boundary of the SVM classifier using a precomputed Laplace kernel (code below) on the similar lines of this scikit-learn post. I'm taking test points as mesh grid values (xx, yy) just like as mentioned in the post and train points as X and y. I'm able to fit the pre-computed kernel using train points.
import numpy as np
#from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.metrics.pairwise import laplacian_kernel
#Load the iris data
iris_data = load_iris()
#Split the data and target
X = iris_data.data[:, :2]
y = iris_data.target
#Step size in mesh plot
h = 0.02
#Convert X and y to a numpy array
X = np.array(X)
y = np.array(y)
#Using Laplacian kernel - https://scikit-learn.org/stable/modules/metrics.html#laplacian-kernel
K = np.array(laplacian_kernel(X, gamma=.5))
svm = SVC(kernel='precomputed').fit(K, np.ravel(y))
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
#plt.subplot(2, 2, i + 1)
#plt.subplots_adjust(wspace=0.4, hspace=0.4)
# Calculate the gram matrix for test points. Here is where the error is coming. xx- test, X-train.
K_test = np.array(laplacian_kernel(xx, X, gamma=.5))
#Predict using the gram matrix for test
Z = svm.predict(np.c_[K_test])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title('SVC with Laplace kernel')
plt.show()
However, when I try to plot the decision boundary on graph for grid points, I get the below error.
Traceback (most recent call last):
File "/home/user/Src/laplce.py", line 37, in <module>
K_test = np.array(laplacian_kernel(xx, X, gamma=.5))
File "/home/user/.local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 1136, in laplacian_kernel
X, Y = check_pairwise_arrays(X, Y)
File "/home/user/.local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 160, in check_pairwise_arrays
raise ValueError("Incompatible dimension for X and Y matrices: "
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 280 while Y.shape[1] == 2
So, how do I resolve the error and plot the decision boundary for iris data ? Thanks in advance

The issue is getting your meshgrid into the same dimensions as the training matrix, before applying the laplacian. So if we run the code below to fit the svm :
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.metrics.pairwise import laplacian_kernel
iris_data = load_iris()
X = iris_data.data[:, :2]
y = iris_data.target
h = 0.02
K = laplacian_kernel(X,gamma=.5)
svm = SVC(kernel='precomputed').fit(K, y)
Create the grid like you did:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
x_test = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
xx,yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))
Your original input into the laplacian was (150,2) so you need to basically put your xx,yy into 2 columns:
x_test = np.vstack([xx.ravel(),yy.ravel()]).T
K_test = laplacian_kernel(x_test, X, gamma=.5)
Z = svm.predict(K_test)
Z = Z.reshape(xx.shape)
Then plot:
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
The points are more or less correct, you can see it does not resolve 1,2 very well:
pd.crosstab(y,svm.predict(K))
col_0 0 1 2
row_0
0 49 1 0
1 0 35 15
2 0 11 39

Related

SVM: plot decision surface when working with more than 2 features

I am working with scikit-learn's breast cancer dataset, consisting of 30 features.
Following this tutorial for the much less depressing iris dataset, I figured how to plot the decision surface separating the "benign" and "malignant" categories, when considering the dataset's first two features (mean radius and mean texture).
This is what I get:
But how to represent the hyperplane computed when using all features in the dataset?
I am aware that I cannot plot a graph in 30 dimensions, but I would like to "project" the hyperplane created when running svm.SVC(kernel='linear', C=1).fit(X_train, y_train) onto the 2D scatter plot showing mean texture against mean radius.
I read about using PCA to reduce dimensionality, but I suspect that fitting a "reduced" dataset is not the same as projecting the hyperplane computed over all 30 features onto a 2D plot.
Here is my code so far:
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import svm
import numpy as np
#Load dataset
cancer = datasets.load_breast_cancer()
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3,random_state=109) # 70% training and 30% test
h = .02 # mesh step
C = 1.0 # Regularisation
clf = svm.SVC(kernel='linear', C=C).fit(X_train[:,:2], y_train) # Linear Kernel
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
scat=plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train)
legend1 = plt.legend(*scat.legend_elements(),
loc="upper right", title="diagnostic")
plt.xlabel('mean_radius')
plt.ylabel('mean_texture')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.show()
You cannot visualize the decision surface for a lot of features. This is because the dimensions will be too many and there is no way to visualize an N-dimensional surface.
I have also written an article about this here:
https://towardsdatascience.com/support-vector-machines-svm-clearly-explained-a-python-tutorial-for-classification-problems-29c539f3ad8?source=friends_link&sk=80f72ab272550d76a0cc3730d7c8af35
However, you can use 2 features and plot nice decision surfaces as follows.
Case 1: 2D plot for 2 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
model = svm.SVC(kernel='linear')
clf = model.fit(X, y)
fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('y label here')
ax.set_xlabel('x label here')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
ax.legend()
plt.show()
Case 2: 3D plot for 3 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from mpl_toolkits.mplot3d import Axes3D
iris = datasets.load_iris()
X = iris.data[:, :3] # we only take the first three features.
Y = iris.target
#make it binary classification problem
X = X[np.logical_or(Y==0,Y==1)]
Y = Y[np.logical_or(Y==0,Y==1)]
model = svm.SVC(kernel='linear')
clf = model.fit(X, Y)
# The equation of the separating plane is given by all x so that np.dot(svc.coef_[0], x) + b = 0.
# Solve for w3 (z)
z = lambda x,y: (-clf.intercept_[0]-clf.coef_[0][0]*x -clf.coef_[0][1]*y) / clf.coef_[0][2]
tmp = np.linspace(-5,5,30)
x,y = np.meshgrid(tmp,tmp)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot3D(X[Y==0,0], X[Y==0,1], X[Y==0,2],'ob')
ax.plot3D(X[Y==1,0], X[Y==1,1], X[Y==1,2],'sr')
ax.plot_surface(x, y, z(x,y))
ax.view_init(30, 60)
plt.show()
You can't plot the 30-dim data without any transformation to 2-d.
https://github.com/tmadl/highdimensional-decision-boundary-plot
What is a Voronoi Tessellation?
Given a set P := {p1, ..., pn} of sites, a Voronoi Tessellation is a subdivision of the space into n cells, one for each site in P, with the property that a point q lies in the cell corresponding to a site pi iff d(pi, q) < d(pj, q) for i distinct from j. The segments in a Voronoi Tessellation correspond to all points in the plane equidistant to the two nearest sites. Voronoi Tessellations have applications in computer science. - https://philogb.github.io/blog/2010/02/12/voronoi-tessellation/
In geometry, a centroidal Voronoi tessellation (CVT) is a special type of Voronoi tessellation or Voronoi diagram. A Voronoi tessellation is called centroidal when the generating point of each Voronoi cell is also its centroid, i.e. the arithmetic mean or center of mass. It can be viewed as an optimal partition corresponding to an optimal distribution of generators. A number of algorithms can be used to generate centroidal Voronoi tessellations, including Lloyd's algorithm for K-means clustering or Quasi-Newton methods like BFGS. - Wiki
import numpy as np, matplotlib.pyplot as plt
from sklearn.neighbors.classification import KNeighborsClassifier
from sklearn.datasets.base import load_breast_cancer
from sklearn.manifold.t_sne import TSNE
from sklearn import svm
bcd = load_breast_cancer()
X,y = bcd.data, bcd.target
X_Train_embedded = TSNE(n_components=2).fit_transform(X)
print(X_Train_embedded.shape)
h = .02 # mesh step
C = 1.0 # Regularisation
clf = svm.SVC(kernel='linear', C=C) # Linear Kernel
clf = clf.fit(X,y)
y_predicted = clf.predict(X)
resolution = 100 # 100x100 background pixels
X2d_xmin, X2d_xmax = np.min(X_Train_embedded[:,0]), np.max(X_Train_embedded[:,0])
X2d_ymin, X2d_ymax = np.min(X_Train_embedded[:,1]), np.max(X_Train_embedded[:,1])
xx, yy = np.meshgrid(np.linspace(X2d_xmin, X2d_xmax, resolution), np.linspace(X2d_ymin, X2d_ymax, resolution))
# approximate Voronoi tesselation on resolution x resolution grid using 1-NN
background_model = KNeighborsClassifier(n_neighbors=1).fit(X_Train_embedded, y_predicted)
voronoiBackground = background_model.predict(np.c_[xx.ravel(), yy.ravel()])
voronoiBackground = voronoiBackground.reshape((resolution, resolution))
#plot
plt.contourf(xx, yy, voronoiBackground)
plt.scatter(X_Train_embedded[:,0], X_Train_embedded[:,1], c=y)
plt.show()

How to print the classified points based on SVM classifier

I was using "svm" classifier to classify it was a bike or car.
So, my features were 0,1,2 columns and dependents was 3rd column.I can able to clearly see the classification,but i don't know how to print all the points based on classification in diagram.
import numpy as np
import operator
from matplotlib import pyplot as plt
from sklearn import svm
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.svm import SVC
dataframe=pd.read_csv(DATASET_PATH)
dataframe = dataframe.dropna(how='any',axis=0)
SVM_Trained_Model = preprocessing.LabelEncoder()
train_data=dataframe[0:len(dataframe)]
le=preprocessing.LabelEncoder()
col=dataframe.columns[START_TRAIN_COLUMN:].astype('U')
col_name=["no_of_wheels","dimensions","windows","vehicle_type"]
for i in range(0,len(col_name)):
self.train_data[col_name[i]]=le.fit_transform(self.train_data[col_name[i]])
train_column=np.array(train_data[col]).astype('U')
data=train_data.iloc[:,[0,1,2]].values
target=train_data.iloc[:,3].values
data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.30,
random_state = 0) `split test and test train`
svc_model=SVC(kernel='rbf', probability=True))'classifier model'
svc_model.fit(data_train, target_train)
all_labels =svc_model.predict(data_test)
X_set, y_set = data_train, target_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step =
0.01),np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
Xpred = np.array([X1.ravel(), X2.ravel()] + [np.repeat(0, X1.ravel().size) for _ in range(1)]).T
pred = svc_model.predict(Xpred).reshape(X1.shape)
plt.contourf(X1, X2, pred,alpha = 0.75, cmap = ListedColormap(('white','orange','pink')))
plt.xlim(X1.min(),X1.max())
plt.ylim(X2.min(), X2.max())
colors=['red','yellow','cyan','blue']
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],c = ListedColormap((colors[i]))(i), label
= j)
plt.title('Multiclass Classifier ')
plt.xlabel('Features')
plt.ylabel('Dependents')
plt.legend()
plt.show()
Image
So here is my diagram I need to print the points using python print() based on pink and white region in the diagram.Please help me to get this points.
You need to select and use only 2 features in order to make a 2D surface plot.
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
model = svm.SVC(kernel='linear')
clf = model.fit(X, y)
fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('y label here')
ax.set_xlabel('x label here')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
ax.legend()
plt.show()

Shape error as I try to plot the decision boundary

From my wine-dataset, I am trying to plot a decision boundary between 2 columns which is described by the snippet:
X0, X1 = X[:, 10], Y
I have taken the following code from scikit svm plot tutorial and modified to replace with my variable names/index. However when I run the following code, I get an error saying:
ValueError: X.shape[1] = 2 should be equal to 11, the number of features at training time
with error stack as:
Traceback (most recent call last):
File "test-wine.py", line 120, in <module>
cmap=plt.cm.coolwarm, alpha=0.8)
File "test-wine.py", line 96, in plot_contours
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
File "/home/suhail/anaconda3/envs/ml/lib/python3.5/site-packages/sklearn/svm/base.py", line 548, in predict
y = super(BaseSVC, self).predict(X)
File "/home/suhail/anaconda3/envs/ml/lib/python3.5/site-packages/sklearn/svm/base.py", line 308, in predict
X = self._validate_for_predict(X)
File "/home/suhail/anaconda3/envs/ml/lib/python3.5/site-packages/sklearn/svm/base.py", line 459, in _validate_for_predict
(n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 2 should be equal to 11, the number of features at training time
I cannot understand the reason for the above error. Here is the code that I have modified.
import pandas as pd
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv('winequality-red.csv').values
x_data_shape = data.shape[0]
y_data_shape = data.shape[1]
X = data[:, 0:y_data_shape-1]
Y = data[:, y_data_shape-1]
############### PLOT DECISION BOUNDARY SVM #############
def make_meshgrid(x, y, h=.02):
"""Create a mesh of points to plot in
Parameters
----------
x: data to base x-axis meshgrid on
y: data to base y-axis meshgrid on
h: stepsize for meshgrid, optional
Returns
-------
xx, yy : ndarray
"""
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
"""Plot the decision boundaries for a classifier.
Parameters
----------
ax: matplotlib axes object
clf: a classifier
xx: meshgrid ndarray
yy: meshgrid ndarray
params: dictionary of params to pass to contourf, optional
"""
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
C = 1.0 # SVM regularization parameter
models = (SVC(kernel='linear', C=C),
SVC(kernel='rbf', gamma=0.7, C=C),
SVC(kernel='poly', degree=3, C=C))
models = (clf.fit(X, Y) for clf in models)
titles = ('SVC with linear kernel',
'SVC with RBF kernel',
'SVC with polynomial (degree 3) kernel')
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)
X0, X1 = X[:, 10], Y
xx, yy = make_meshgrid(X0, X1)
for clf, title, ax in zip(models, titles, sub.flatten()):
plot_contours(ax, clf, xx, yy,
cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=Y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xlabel('Alcohol Content')
ax.set_ylabel('Quality')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
plt.show()
What could be the reason for this error?
You trained the classifiers with all 11 features,
but you provide only 2 features for the evaluation of the classifier which happens when Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) is called from within the plot_contours method.
To evaluate a classifier trained with 11 features, you need to provide all 11 features. This is what your error message indicates.
So in order to make the snippet work for you, you should limit yourself to two features (otherwise plotting two-dimensional decision boundaries does not make sense anyway), e.g by using
X = data[:, :2]
Y = data[:, y_data_shape-1]
when reading your data.
Note that the example you referred to also uses only two features:
# import some data to play with
iris = datasets.load_iris()
# Take the first two features. We could avoid this by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

Plot scikit-learn (sklearn) SVM decision boundary / surface

I am currently performing multi class SVM with linear kernel using python's scikit library.
The sample training data and testing data are as given below:
Model data:
x = [[20,32,45,33,32,44,0],[23,32,45,12,32,66,11],[16,32,45,12,32,44,23],[120,2,55,62,82,14,81],[30,222,115,12,42,64,91],[220,12,55,222,82,14,181],[30,222,315,12,222,64,111]]
y = [0,0,0,1,1,2,2]
I want to plot the decision boundary and visualize the datasets. Can someone please help to plot this type of data.
The data given above is just mock data so feel free to change the values.
It would be helpful if at least if you could suggest the steps that are to followed.
Thanks in advance
You have to choose only 2 features to do this. The reason is that you cannot plot a 7D plot. After selecting the 2 features use only these for the visualization of the decision surface.
(I have also written an article about this here: https://towardsdatascience.com/support-vector-machines-svm-clearly-explained-a-python-tutorial-for-classification-problems-29c539f3ad8?source=friends_link&sk=80f72ab272550d76a0cc3730d7c8af35)
Now, the next question that you would ask: How can I choose these 2 features?. Well, there are a lot of ways. You could do a univariate F-value (feature ranking) test and see what features/variables are the most important. Then you could use these for the plot. Also, we could reduce the dimensionality from 7 to 2 using PCA for example.
2D plot for 2 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
# Select 2 features / variable for the 2D plot that we are going to create.
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
model = svm.SVC(kernel='linear')
clf = model.fit(X, y)
fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('y label here')
ax.set_xlabel('x label here')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
ax.legend()
plt.show()
EDIT: Apply PCA to reduce dimensionality.
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = PCA(n_components=2)
Xreduced = pca.fit_transform(X)
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
model = svm.SVC(kernel='linear')
clf = model.fit(Xreduced, y)
fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = Xreduced[:, 0], Xreduced[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('PC2')
ax.set_xlabel('PC1')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title('Decison surface using the PCA transformed/projected features')
ax.legend()
plt.show()
EDIT 1 (April 15th, 2020):
Case: 3D plot for 3 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from mpl_toolkits.mplot3d import Axes3D
iris = datasets.load_iris()
X = iris.data[:, :3] # we only take the first three features.
Y = iris.target
#make it binary classification problem
X = X[np.logical_or(Y==0,Y==1)]
Y = Y[np.logical_or(Y==0,Y==1)]
model = svm.SVC(kernel='linear')
clf = model.fit(X, Y)
# The equation of the separating plane is given by all x so that np.dot(svc.coef_[0], x) + b = 0.
# Solve for w3 (z)
z = lambda x,y: (-clf.intercept_[0]-clf.coef_[0][0]*x -clf.coef_[0][1]*y) / clf.coef_[0][2]
tmp = np.linspace(-5,5,30)
x,y = np.meshgrid(tmp,tmp)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot3D(X[Y==0,0], X[Y==0,1], X[Y==0,2],'ob')
ax.plot3D(X[Y==1,0], X[Y==1,1], X[Y==1,2],'sr')
ax.plot_surface(x, y, z(x,y))
ax.view_init(30, 60)
plt.show()
You can use mlxtend. It's quite clean.
First do a pip install mlxtend, and then:
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
svm = SVC(C=0.5, kernel='linear')
svm.fit(X, y)
plot_decision_regions(X, y, clf=svm, legend=2)
plt.show()
Where X is a two-dimensional data matrix, and y is the associated vector of training labels.

plotting linear SVM

I tried following the example here but i am having trouble applying it when i have 16 features. lin_svc is trained with those 16 features (i deleted the line to re-train it again from the example). it works and i tried it and also extracted .coef_before.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
#features is an array of 16
#lin_svc variable is available
#train is a pandas DF
X = train[features].as_matrix()
y = train.outcome
h = .02 # step size in the mesh
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# title for the plots
titles = ['SVC with linear kernel']
for i, clf in enumerate([lin_svc]):
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
plt.subplot(2, 2, i + 1)
plt.subplots_adjust(wspace=0.4, hspace=0.4)
Z = clf.predict(X)
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title(titles[i])
plt.show()
The error i am getting is:
ValueError Traceback (most recent call last)
<ipython-input-8-d52ca252fc3a> in <module>()
24
25 # Put the result into a color plot
---> 26 Z = Z.reshape(xx.shape)
27 plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
28
ValueError: total size of new array must be unchanged
I've encountered this same issue myself. Since you're really interested in plotting Z as a function of xx and yy, you should be passing those to clf.predict() rathan than passing X. Try replacing
Z = clf.predict(X)
with
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
and the plot should show nicely (assuming no other bugs).
Also you may want to change the title of your question to something like "Plotting 2-D Decision Boundary," since this has nothing to do with SVMs specifically. You'll encounter this kind of issue with any of the sklearn classifiers.

Categories

Resources