I am very new to matplotlib and am working on simple projects to get acquainted with it. I was wondering how I might plot the decision boundary which is the weight vector of the form [w1,w2], which basically separates the two classes lets say C1 and C2, using matplotlib.
Is it as simple as plotting a line from (0,0) to the point (w1,w2) (since W is the weight "vector") if so, how do I extend this like in both directions if I need to?
Right now all I am doing is :
import matplotlib.pyplot as plt
plt.plot([0,w1],[0,w2])
plt.show()
Thanks in advance.
Decision boundary is generally much more complex then just a line, and so (in 2d dimensional case) it is better to use the code for generic case, which will also work well with linear classifiers. The simplest idea is to plot contour plot of the decision function
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# here "model" is your model's prediction (classification) function
Z = model(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)
some examples from sklearn documentation
Related
I am very new to matplotlib and am working on simple projects to get acquainted with it. I was wondering how I might plot the decision boundary which is the weight vector of the form [w1,w2], which basically separates the two classes lets say C1 and C2, using matplotlib.
Is it as simple as plotting a line from (0,0) to the point (w1,w2) (since W is the weight "vector") if so, how do I extend this like in both directions if I need to?
Right now all I am doing is :
import matplotlib.pyplot as plt
plt.plot([0,w1],[0,w2])
plt.show()
Thanks in advance.
Decision boundary is generally much more complex then just a line, and so (in 2d dimensional case) it is better to use the code for generic case, which will also work well with linear classifiers. The simplest idea is to plot contour plot of the decision function
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# here "model" is your model's prediction (classification) function
Z = model(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)
some examples from sklearn documentation
I'm working on some Python ML exercises and I'm stuck on a question.
I have a dataframe with 7 columns and almost 10k lines. 6 of those column/variables are objects and 1 is a float. The 7 variables are : Company, Job, Technologies, Degree, Experience (the one float variable - # of years), City, and Exp_level.
I want to do an unsupervised clustering to show 2 variables I deem important.
The code I've been testing hasn't been working and it seems that there is an issue with the mixed variables I have.
x = df
y = x.pop('Metier')
y.unique()
OneHotEncoder().fit(df.dropna()).categories_
x.values, y
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = KNN.KNeighborsClassifier(5, weights=weights)
clf.fit(x.values, y.values)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
This is the 8th exercise by the way, so all my imports and dataframe loading were done in the beginning.
The error I keep having is ValueError: could not convert string to float: 'Sanofi' (the name of a company).
I'm doing my best to train and work on my Python skills. I hope I gave enough information to show that. Is there a better way to obtain my goal? I can only use the libraries :
import pandas as pd
import numpy as np
import re
import sklearn as sk
import sklearn.neighbors as KNN
from sklearn.preprocessing import OneHotEncoder
import seaborn as sb
from matplotlib import pyplot as plt
Hoping I can figure out this tricky exercise, any help would be greatly appreciated! I thank you in advance :) Super happy to be working on my Python skills more and more.
This is my df :
I am referring the code example here (http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html), and specifically confused by this line iris.data[:, :2], since iris.data is 150 (row) * 4 (column) dimensional I think it means, select all rows, and the first two columns. I ask here to confirm if my understanding is correct, since I take time but cannot find such syntax definition official document.
Another question is, I am using the following code to get # of rows and # of columns, not sure if better more elegant ways? My code is more Python native style and not sure if numpy has better style to get the related values.
print len(iris.data) # for number of rows
print len(iris.data[0]) # for number of columns
Using Python 2.7 with miniconda interpreter.
print(__doc__)
# Code source: Gaƫl Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
Y = iris.target
h = .02 # step size in the mesh
logreg = linear_model.LogisticRegression(C=1e5)
# we create an instance of Neighbours Classifier and fit the data.
logreg.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()
regards,
Lin
You are right. The first syntax selects the first 2 columns/features. Another way to query dimensions is to look at iris.data.shape. This will return a n-dimensional tuple with the length. You can find some documentation here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
import numpy as np
x = np.random.rand(100, 200)
# Select the first 2 columns
y = x[:, :2]
# Get the row length
print (y.shape[0])
# Get the column length
print (y.shape[1])
# Number of dimensions
print (len(y.shape))
I have programmed a multilayer perception for binary classification. As I understand it, one hidden layer can be represented using just lines as decision boundaries (one line per hidden neuron). This works well and can easily be plotted just using the resulting weights after training.
However, as more layers are added I'm not sure about what approach to use and the visualization part is rarely handled in textbooks. I am wondering, is there a straight forward way of transforming the weight matrices from the different layers to this non-linear decision boundary (assuming 2D inputs)?
Many thanks,
One of the approaches to plot decision boundaries (both for a linear or non-linear classifier) is to sample points in a uniform grid and feed them to the classifier. Asumming X is your data, you can create a uniform grid of points as follows:
h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Then, you feed those coordinates to your perceptron to capture their prediction:
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Assuming clf is your Perceptron, the np.c_ creates features from the uniformly sampled points, feeds them to the classifier and captures in Z their prediction.
Finally, plot the decision boundaries as a contour plot (using matplotlib):
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
And optionally, plot also your data points:
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
Fully working example, and credits for the example goes to scikit-learn (which btw, is a great machine learning library with a fully working Perceptron implemented).
I am trying to determine discriminant function and plot decision boundary for a given data set. My data set has three observations, lets say X1, X2 and X3 with two features. I was able to plot the data set as shown below
Now I am able to calculate the mean and covariance of the data and hence come up with P(x|C) function. But when I try to plot this function using following method, I get wrong results.
I am using following way to plot the decision boundry:
x_min, x_max = XX[:, 0].min() - 1, XX[:, 0].max() + 1
y_min, y_max = XX[:, 1].min() - 1, XX[:, 1].max() + 1
xx, yy = meshgrid(arange(x_min, x_max, .02),
arange(y_min, y_max, .02))
Z = A*(xx**2)-B*(yy**2)-C*xx*yy-D*xx-E*yy+F
plt.contourf(xx, yy, Z)
I would appreciate any guidance in correcting my mistakes.
Thanks in Advance.
Shashank