Problem in plotting support vector contours in Jupyter - python

I am following this script from ScikitLearn to plot the margins and the hyperplane for SVC. I'm executing each line in a different cell and am following the same order, i.e. cell 1 for line 1, cell 2 for line 2, and so on. When I finally get to the plotting part, say on cell k(which is the last cell in my notebook that holds the last 7 lines of the code), I get UserWarning: No contour levels were found within the data range. and no plot as mentioned in the link given above shows up.
However, when I execute all the lines in the same cell, the code works as expected. What am I doing wrong here?
The code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs
# we create 40 separable points
X, y = make_blobs(n_samples=40, centers=2, random_state=6)
# fit the model, don't regularize for illustration purposes
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
plt.show()

Related

Plotting a logisitc regression superimposed over a probability chart

I want to build a chart similar to this
I have created a bar chart, and I have the logistic regression completed.
#imports
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
plt.bar(prob_df['diff'], prob_df['full_win_prob'])
plt.show()
#logistic regression
X = dfx['home_diff'].values
y = dfx['away_win'].values
X = X.reshape(-1, 1)
logreg = LogisticRegression()
logreg.fit(X, y)
print(logreg.intercept_, logreg.coef_)
[-0.67032214] [[0.04948131]] #results
I have the chart and I have the model, I can't figure out how to plot the model on top of the chart, its a bit frustrating I'm sure the answer is simple. I would prefer an answer in matplotlib but seaborn is also ok.
You can use yy = logreg.predict_proba(XX)[:,1] to plot the logistic curve for an array of x-values. logreg.predict_proba(XX)[:,0] gives the inverted curve, the probability of being 0. logreg.predict(XX) gives the predictions, i.e. the logistic curve rounded to 0 or 1.
Here is an example starting from some generated test data.
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
import numpy as np
np.random.seed(123)
# logistic regression
X = np.random.uniform(1, 9, 20) # dfx['home_diff'].values
y = np.random.choice([0, 1], 20, p=[0.8, 0.2])
y = np.where(X < 5, y, 1 - y) # dfx['away_win'].values
X = X.reshape(-1, 1)
plt.scatter(X[:, 0], y, color='black', label='Given values')
logreg = LogisticRegression()
logreg.fit(X, y)
XX = np.linspace(0, 10, 1000).reshape(-1, 1)
prediction = logreg.predict(XX)
probability_0, probability_1 = logreg.predict_proba(XX).T
plt.plot(XX[:, 0], prediction, color='limegreen', lw=2, alpha=0.7, label='Predicted values')
plt.plot(XX[:, 0], probability_0, color='crimson', lw=2, alpha=0.7, ls='--', label='Probability of being 0')
plt.plot(XX[:, 0], probability_1, color='deepskyblue', lw=2, alpha=0.7, label='Probability of being 1')
plt.legend()
Thanks to JohanCs help I was able to get it. Clearly my answer is heavily inspired by his posting. I adjusted the code a bit so it displays the model over the bar graph rather than the scatterplot. I tried to upvote you but I don't have enough karma.
Here is what I used and the result:
# logistic regression
X = dfx['home_diff'].values
y = dfx['away_win'].values
X = X.reshape(-1, 1)
logreg = LogisticRegression()
logreg.fit(X, y)
prediction = logreg.predict(X)
probability_0, probability_1 = logreg.predict_proba(X).T
plt.plot(X[:, 0], probability_1, color='blue', lw=2, alpha=0.6, label='Probability of being 1')
plt.bar(prob_df['diff'], prob_df['full_win_prob'], color='orange')
plt.legend()

How to have multiple categorical markers on a scatterplot

I want to train logistic regression model, and after that create a plot which shows boundary lines, but in specific way.
My work so far
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from matplotlib.colors import ListedColormap
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
Y = iris.target
logreg = LogisticRegression(C=1e5)
# Create an instance of Logistic Regression Classifier and fit the data.
logreg.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02 # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:,1], c=Y, marker='x',edgecolors='k', cmap=cmap_bold)
plt.xlabel('Sepal length'),
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()
However I find it very unreadable. I want to have other markers for each classification and legend in left upper corner. Just like in the image below :
Do you have any idea how can I change that ? I played with marker ='s', marker='x', but those change all points on scatter plot, instead of one specific classification.
Since you are plotting with categorical values, you can just plot each class separately:
# Replace this
# plt.scatter(X[:, 0], X[:,1], c=Y, marker='x',edgecolors='k', cmap=cmap_bold)
# with this
markers = 'sxo'
for m,i in zip(markers,np.unique(Y)):
mask = Y==i
plt.scatter(X[mask, 0], X[mask,1], c=cmap_bold.colors[i],
marker=m,edgecolors='k', label=i)
plt.legend()
Output:
I find it easier to create a dataframe from X & Y, and then plot the data points with seaborn.scatterplot.
seaborn is a high-level api for matplotlib
As shown in How to extract the boundary values from k-nearest neighbors predict, the dataframe columns can be used to specify all data for fitting, and x and y min and max.
load and setup the data
import numpy as np
import matplotlib.pyplot as plt # version 3.3.1
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from matplotlib.colors import ListedColormap
import seaborn # versuin 0.11.0
import pandas # version 1.1.3
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
# seaborn.scatterplot palette parameter takes a list
palette = ['#FF0000', '#00FF00', '#0000FF']
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
Y = iris.target
# add X & Y to dataframe
df = pd.DataFrame(X, columns=iris.feature_names[:2])
df['label'] = Y
# map the number values to the species name and add it to the dataframe
species_map = dict(zip(range(3), iris.target_names))
df['species'] = df.label.map(species_map)
logreg = LogisticRegression(C=1e5)
# Create an instance of Logistic Regression Classifier and fit the data.
logreg.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02 # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plot the data
plt.figure(1, figsize=(8, 6))
plt.pcolormesh(xx, yy, Z, cmap=cmap_light, shading='auto')
# Plot also the training points
# add data points using seaborn
sns.scatterplot(data=df, x='sepal length (cm)', y='sepal width (cm)', hue='species',
style='species', edgecolor='k', alpha=0.5, palette=palette, s=70)
# change legend location
plt.legend(title='Species', loc=2)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
# plt.xticks(())
# plt.yticks(())
plt.show()
alpha=0.5 is used with sns.scatterplot, to show that some values of 'versicolor' and 'virginica' overlap.
If the species label is desired for the legend, instead of the name, change hue='species' to hue='label'.
You need to change a single call to plt.scatter to one call for each marker type, since matplotlib does not allow passing multiple marker types as it does with color.
The plot code becomes something like
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
X0 = X[Y==0]
X1 = X[Y==1]
X2 = X[Y==2]
Y0 = Y[Y==0]
Y1 = Y[Y==1]
Y2 = Y[Y==2]
plt.scatter(X0[:, 0], X0[:,1], marker='s',color="red")
plt.scatter(X1[:, 0], X1[:,1], marker='x',color="blue")
plt.scatter(X2[:, 0], X2[:,1], marker='o',color="green")
plt.xlabel('Sepal length'),
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()
where you individually set the marker type and color of each class. You can also create a list for the marker type and another for the color and use a loop.

How to plot a Python 3-dimensional level set?

I have some trouble plotting the image which is in my head.
I want to visualize the Kernel-trick with Support Vector Machines. So I made some two-dimensional data consisting of two circles (an inner and an outer circle) which should be separated by a hyperplane. Obviously this isn't possible in two dimensions - so I transformed them into 3D. Let n be the number of samples. Now I have an (n,3)-array (3 columns, n rows) X of data points and an (n,1)-array y with labels. Using sklearn I get the linear classifier via
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(X, y)
I already plot the data points as scatter plot via
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
Now I want to plot the separating hyperplane as surface plot. My problem here is the missing explicit representation of the hyperplane because the decision function only yields an implicit hyperplane via decision_function = 0. Therefore I need to plot the level set (of level 0) of an 4-dimensional object.
Since I'm not a python expert I would appreciate if somebody could help me out! And I know that this isn't really the "style" of using a SVM but I need this image as an illustration for my thesis.
Edit: my current "code"
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs, make_circles
from tikzplotlib import save as tikz_save
plt.close('all')
# we create 50 separable points
#X, y = make_blobs(n_samples=40, centers=2, random_state=6)
X, y = make_circles(n_samples=50, factor=0.5, random_state=4, noise=.05)
X2, y2 = make_circles(n_samples=50, factor=0.2, random_state=5, noise=.08)
X = np.append(X,X2, axis=0)
y = np.append(y,y2, axis=0)
# shifte X to [0,2]x[0,2]
X = np.array([[item[0] + 1, item[1] + 1] for item in X])
X[X<0] = 0.01
clf = svm.SVC(kernel='rbf', C=1000)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
################## KERNEL TRICK - 3D ##################
trans_X = np.array([[item[0]**2, item[1]**2, np.sqrt(2*item[0]*item[1])] for item in X])
fig = plt.figure()
ax = plt.axes(projection ="3d")
# creating scatter plot
ax.scatter3D(trans_X[:,0],trans_X[:,1],trans_X[:,2], c = y, cmap=plt.cm.Paired)
clf2 = svm.SVC(kernel='linear', C=1000)
clf2.fit(trans_X, y)
ax = plt.gca(projection='3d')
xlim = ax.get_xlim()
ylim = ax.get_ylim()
zlim = ax.get_zlim()
### from here i don't know what to do ###
xx = np.linspace(xlim[0], xlim[1], 3)
yy = np.linspace(ylim[0], ylim[1], 3)
zz = np.linspace(zlim[0], zlim[1], 3)
ZZ, YY, XX = np.meshgrid(zz, yy, xx)
xyz = np.vstack([XX.ravel(), YY.ravel(), ZZ.ravel()]).T
Z = clf2.decision_function(xyz).reshape(XX.shape)
#ax.contour(XX, YY, ZZ, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
Desired Output
I want to get something like that.
In general I want to reconstruct what they do in this article, especially "Non-linear transformations".
Part of your question is addressed in this question on linear-kernel SVM. It's a partial answer, because only linear kernels can be represented this way, i.e. thanks to hyperplane coordinates accessible via the estimator when using linear kernel.
Another solution is to find the isosurface with marching_cubes
This solution involves installing the scikit-image toolkit (https://scikit-image.org) which allows to find an isosurface of a given value (here, I considered 0 since it represents the distance to the hyperplane) from the mesh grid of the 3D coordinates.
In the code below (copied from yours), I implement the idea for any kernel (in the example, I used the RBF kernel), and the output is shown beneath the code. Please consider my footnote about 3D plotting with matplotlib, which may be another issue in your case.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from skimage import measure
from sklearn.datasets import make_blobs, make_circles
from tikzplotlib import save as tikz_save
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
plt.close('all')
# we create 50 separable points
#X, y = make_blobs(n_samples=40, centers=2, random_state=6)
X, y = make_circles(n_samples=50, factor=0.5, random_state=4, noise=.05)
X2, y2 = make_circles(n_samples=50, factor=0.2, random_state=5, noise=.08)
X = np.append(X,X2, axis=0)
y = np.append(y,y2, axis=0)
# shifte X to [0,2]x[0,2]
X = np.array([[item[0] + 1, item[1] + 1] for item in X])
X[X<0] = 0.01
clf = svm.SVC(kernel='rbf', C=1000)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
################## KERNEL TRICK - 3D ##################
trans_X = np.array([[item[0]**2, item[1]**2, np.sqrt(2*item[0]*item[1])] for item in X])
fig = plt.figure()
ax = plt.axes(projection ="3d")
# creating scatter plot
ax.scatter3D(trans_X[:,0],trans_X[:,1],trans_X[:,2], c = y, cmap=plt.cm.Paired)
clf2 = svm.SVC(kernel='rbf', C=1000)
clf2.fit(trans_X, y)
z = lambda x,y: (-clf2.intercept_[0]-clf2.coef_[0][0]*x-clf2.coef_[0][1]*y) / clf2.coef_[0][2]
ax = plt.gca(projection='3d')
xlim = ax.get_xlim()
ylim = ax.get_ylim()
zlim = ax.get_zlim()
### from here i don't know what to do ###
xx = np.linspace(xlim[0], xlim[1], 50)
yy = np.linspace(ylim[0], ylim[1], 50)
zz = np.linspace(zlim[0], zlim[1], 50)
XX ,YY, ZZ = np.meshgrid(xx, yy, zz)
xyz = np.vstack([XX.ravel(), YY.ravel(), ZZ.ravel()]).T
Z = clf2.decision_function(xyz).reshape(XX.shape)
# find isosurface with marching cubes
dx = xx[1] - xx[0]
dy = yy[1] - yy[0]
dz = zz[1] - zz[0]
verts, faces, _, _ = measure.marching_cubes_lewiner(Z, 0, spacing=(1, 1, 1), step_size=2)
verts *= np.array([dx, dy, dz])
verts -= np.array([xlim[0], ylim[0], zlim[0]])
# add as Poly3DCollection
mesh = Poly3DCollection(verts[faces])
mesh.set_facecolor('g')
mesh.set_edgecolor('none')
mesh.set_alpha(0.3)
ax.add_collection3d(mesh)
ax.view_init(20, -45)
plt.savefig('kerneltrick')
Running the code produces the following image with Matplotlib, where the green semi-transparent surface represents the non-linear decision boundary.
Footnote: 3D plotting with matplotlib
Note that Matplotlib 3D is not able to manage the "depth" of objects in some cases, because it can be in conflict with the zorder of this object. This is the reason why sometimes the hyperplane look to be plotted "on top of" the points, even it should be "behind". This issue is a known bug discussed in the matplotlib 3d documentation and in this answer.
If you want to have better rendering results, you may want to use Mayavi, as recommended by the Matplotlib developers, or any other 3D Python plotting library.

How to plot SVM decision boundary in sklearn Python?

Using SVM with sklearn library, I would like to plot the data with each labels representing its color. I don't want to color the points but filling area with colors.
I have now :
d_pred, d_train_std, d_test_std, l_train, l_test
d_pred are the labels predicted.
I would plot d_pred with d_train_std with shape : (70000,2) where X-axis are the first column and Y-Axis the second column.
Thank you.
You cannot visualize the decision surface for a lot of features. This is because the dimensions will be too many and there is no way to visualize an N-dimensional surface.
However, you can use 2 features and plot nice decision surfaces as follows.
I have also written an article about this here:
https://towardsdatascience.com/support-vector-machines-svm-clearly-explained-a-python-tutorial-for-classification-problems-29c539f3ad8?source=friends_link&sk=80f72ab272550d76a0cc3730d7c8af35
Case 1: 2D plot for 2 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
def make_meshgrid(x, y, h=.02):
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
model = svm.SVC(kernel='linear')
clf = model.fit(X, y)
fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('y label here')
ax.set_xlabel('x label here')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
ax.legend()
plt.show()
Case 2: 3D plot for 3 features and using the iris dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from mpl_toolkits.mplot3d import Axes3D
iris = datasets.load_iris()
X = iris.data[:, :3] # we only take the first three features.
Y = iris.target
#make it binary classification problem
X = X[np.logical_or(Y==0,Y==1)]
Y = Y[np.logical_or(Y==0,Y==1)]
model = svm.SVC(kernel='linear')
clf = model.fit(X, Y)
# The equation of the separating plane is given by all x so that np.dot(svc.coef_[0], x) + b = 0.
# Solve for w3 (z)
z = lambda x,y: (-clf.intercept_[0]-clf.coef_[0][0]*x -clf.coef_[0][1]*y) / clf.coef_[0][2]
tmp = np.linspace(-5,5,30)
x,y = np.meshgrid(tmp,tmp)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot3D(X[Y==0,0], X[Y==0,1], X[Y==0,2],'ob')
ax.plot3D(X[Y==1,0], X[Y==1,1], X[Y==1,2],'sr')
ax.plot_surface(x, y, z(x,y))
ax.view_init(30, 60)
plt.show()
It can be difficult to get the function in 3D. An easy way to get a visualization is to get a large amount of points that cover your point space and run them through your learned function (my_model.predict), keep the points that hit inside the function, and visualize them. The more you add the more defined the boundary will be.
Here's my code that does what #Christian Tuchez describes:
outputs = my_clf.predict(1_test)
hits = []
for i in range(outputs.size):
if outputs[i] == 1:
hits.append(i) # save the index where it's 1
This saves the index of all the points that hit in the function (saved in the "hits" list). You can probably accomplish this without a loop, I just found it easiest for me.
Then to display just those points, you'd write something like this:
ax.scatter(1_test[hits[:], 0], 1_test[hits[:], 1], 1_test[hits[:], 2], c="cyan", s=2, edgecolor=None)

Subplot drawn by auxiliary function experiences unexpected shift

I'm trying to merge two plots in one:
http://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_iris.html
http://scikit-learn.org/stable/auto_examples/ensemble/plot_voting_decision_regions.html#sphx-glr-auto-examples-ensemble-plot-voting-decision-regions-py
In the left plot I want to display the decision boundary with the hyperplane corresponding to the OVA classifiers and in the right plot I would like to show the decision probabilities.
This is the code so far:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
from sklearn import datasets
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC
def plot_hyperplane(c, color, fitted_model):
"""
Plot the one-against-all classifiers for the given model.
Parameters
--------------
c : index of the hyperplane to be plot
color : color to be used when drawing the line
fitted_model : the fitted model
"""
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
try:
coef = fitted_model.coef_
intercept = fitted_model.intercept_
except:
return
def line(x0):
return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1]
plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color, zorder=3)
def plot_decision_boundary(X, y, fitted_model, features, targets):
"""
This function plots a model decision boundary as well as it tries to plot
the decision probabilities, if available.
Requires a model fitted with two features only.
Parameters
--------------
X : the data to learn
y : the classification labels
fitted_model : the fitted model
"""
cmap = plt.get_cmap('Set3')
prob = cmap
colors = [cmap(i) for i in np.linspace(0, 1, len(fitted_model.classes_))]
plt.figure(figsize=(9.5, 5))
for i, plot_type in enumerate(['Decision Boundary', 'Decision Probabilities']):
plt.subplot(1, 2, i+1)
mesh_step_size = 0.01 # step size in the mesh
x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1
y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1
xx, yy = np.meshgrid(np.arange(x_min, x_max, mesh_step_size), np.arange(y_min, y_max, mesh_step_size))
# First plot, predicted results using the given model
if i == 0:
Z = fitted_model.predict(np.c_[xx.ravel(), yy.ravel()])
for h, color in zip(fitted_model.classes_, colors):
plot_hyperplane(h, color, fitted_model)
# Second plot, predicted probabilities using the given model
else:
prob = 'RdYlBu_r'
try:
Z = fitted_model.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
except:
plt.text(0.4, 0.5, 'Probabilities Unavailable', horizontalalignment='center',
verticalalignment='center', transform=plt.gca().transAxes, fontsize=12)
plt.axis('off')
break
Z = Z.reshape(xx.shape)
# Display Z
plt.imshow(Z, interpolation='nearest', cmap=prob, alpha=0.5,
extent=(x_min, x_max, y_min, y_max), origin='lower', zorder=1)
# Plot the data points
for i, color in zip(fitted_model.classes_, colors):
idx = np.where(y == i)
plt.scatter(X[idx, 0], X[idx, 1], facecolor=color, edgecolor='k', lw=1,
label=iris.target_names[i], cmap=cmap, alpha=0.8, zorder=2)
plt.title(plot_type + '\n' +
str(fitted_model).split('(')[0]+ ' Test Accuracy: ' + str(np.round(fitted_model.score(X, y), 5)))
plt.xlabel(features[0])
plt.ylabel(features[1])
plt.gca().set_aspect('equal')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
plt.subplots_adjust(top=0.9, bottom=0.08, wspace=0.02)
plt.show()
if __name__ == '__main__':
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
y = iris.target
scaler = preprocessing.StandardScaler().fit_transform(X)
clf1 = DecisionTreeClassifier(max_depth=4)
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
clf4 = SGDClassifier(alpha=0.001, n_iter=100).fit(X, y)
clf1.fit(X, y)
clf2.fit(X, y)
clf3.fit(X, y)
clf4.fit(X, y)
plot_decision_boundary(X, y, clf1, iris.feature_names, iris.target_names[[0, 2]])
plot_decision_boundary(X, y, clf2, iris.feature_names, iris.target_names[[0, 2]])
plot_decision_boundary(X, y, clf3, iris.feature_names, iris.target_names[[0, 2]])
plot_decision_boundary(X, y, clf4, iris.feature_names, iris.target_names[[0, 2]])
And the results:
As can be seen, for the last example (clf4 in the given code) I'm so far unable to plot the hyperplane in the wrong position. I wonder how to correct this. They should be translated to the correct range regarding the used features to fit the model.
Thanks.
Apparently, the problem is the ends of dashed lines representing hyperplanes are not consistent with the final and expected xlim and ylim. A good thing about this case is that you already have x_min, x_max, y_min, y_max defined. So use that and fix xlim and ylim by applying the following 3 lines before plotting hyperplanes (specifically, add in front of your comment line # First plot, predicted results using the given model).
ax = plt.gca()
ax.set_xlim((x_min, x_max), auto=False)
ax.set_ylim((y_min, y_max), auto=False)

Categories

Resources