Visualizing plot decision boundary by a Keras model - python

I am trying to plot a decision plot boundary of model prediction by Keras. However, the boundary that is generated seems incorrect.
Here's my model
def base():
model = Sequential()
model.add(Dense(5,activation = 'relu', input_dim = 2))
model.add(Dense(2,activation = 'relu'))
model.add(Dense(1,activation = 'sigmoid'))
model.compile(optimizer = optimizers.SGD(lr=0.0007, momentum=0.0, decay=0.0), loss = 'binary_crossentropy', metrics= ['accuracy'])
return model
model = base()
history = model.fit(train_X,train_Y, epochs = 10000, batch_size =64, verbose = 2)
And here's my plot function (taken from here)
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
"""
Function to plot the decision boundary and data points of a model.
Data points are colored based on their actual label.
"""
cmap = get_cmap(cmap)
# Define region of interest by data limits
xmin, xmax = X[:,0].min() - 1, X[:,0].max() + 1
ymin, ymax = X[:,1].min() - 1, X[:,1].max() + 1
steps = 1000
x_span = linspace(xmin, xmax, steps)
y_span = linspace(ymin, ymax, steps)
xx, yy = meshgrid(x_span, y_span)
# Make predictions across region of interest
labels = model.predict(c_[xx.ravel(), yy.ravel()])
# Plot decision boundary in region of interest
z = labels.reshape(xx.shape)
fig, ax = subplots()
ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)
# Get predicted labels on training data and plot
train_labels = model.predict(X)
ax.scatter(X[:,0], X[:,1], c=y.ravel(), cmap=cmap, lw=0)
return fig, ax
plot_decision_boundary(train_X,train_Y, model, cmap = 'RdBu')
And I get a plot like this
Which is obviously a very flawed depiction of a plot decision boundary (not informative at all due to the presence of so many boundaries). Can somebody point the error in my case?

Since probability is a continuous value from 0 to 1, we are getting many contours.
If your visualization is restricted to 2 classes (output is 2D softmax vector) you can use this simple code
def plot_model_out(x,y,model):
"""
x,y: 2D MeshGrid input
model: Keras Model API Object
"""
grid = np.stack((x,y))
grid = grid.T.reshape(-1,2)
outs = model.predict(grid)
y1 = outs.T[0].reshape(x.shape[0],x.shape[0])
plt.contourf(x,y,y1)
plt.show()
This will give contours (more than one), if you want a single contour line you can do the following
You can threshold the probability output from model.predict and display a single contour line.
For Example,
import numpy as np
from matplotlib import pyplot as plt
a = np.linspace(-5, 5, 100)
xx, yy = np.meshgrid(a,a)
z = xx**2 + yy**2
# z = z > 5 (Threshold value)
plt.contourf(xx, yy, z,)
plt.show()
With threshold value commented and not commented we get 2 images
Multiple contours due to continuous values
Single contour as the z is thresholded (z = z > 5)
A similar method can be used on the output softmax vector like this
label = label > 0.5
For more information regarding visualization codes refer IITM CVI Blog

Related

Gaussian Fit function anomaly

I have written a code to fit the gaussian function in a dataset by scipy curve_fit. There are a few different datasets. One with 19 points and one with 21 points and both of them include different datasets in range of 0.5-0.7, 1.0-1.2 and 1.5-1.7.
Surprisingly, when I ran the code in 19 point datasets, all three of them executed successfully but in case of 21 point datasets, only 1.5-1.7 ranged data had the right fit. All others were given with horribly wrong fit.
Here is the code.
#function declaration
def gauss(x, amp, mu, sigma):
y = amp*np.exp(-(x-mu)**2/(2*sigma**2))
return y
#fitting
popt, pcov = curve_fit(f = gauss, xdata = x, ydata = y)
#print(popt)
amp = popt[0]
mu = popt[1]
sigma = popt[2]
print(amp,mu,sigma)
#krypton value
krypton_y = amp/((math.exp(1))**2)
#print(krypton_y)
krypton_x1 = mu + math.sqrt((-2*(sigma**2))*math.log(krypton_y/amp))
krypton_x2 = mu - math.sqrt((-2*(sigma**2))*math.log(krypton_y/amp))
print(krypton_x1-krypton_x2)
#print(gauss([krypton_x1, krypton_x2], popt[0], popt[1], popt[2]))
#horizontal line
horizontal_x = np.arange(min(x)-0.01, max(x)+0.02, 0.01)
horizontal_y = np.repeat(0, len(horizontal_x))
#build fit set
x_test = np.arange(min(x), max(x), 0.0000001)
y_test = gauss(x_test, popt[0], popt[1], popt[2])
y_krypton = []
for i in horizontal_x:
y_krypton.append(krypton_y)
#Vertical lines
vertical_y = np.arange(-20, amp+20, 0.01)
l = len(vertical_y)
vertical_mean = np.repeat(mu, l)
#fit data
fig = plt.figure()
fig = plt.scatter(x,y, label ='original data', color = 'red', marker = 'x')
fig = plt.plot(x_test, y_test, label = 'Gaussian fit curve')
fig = plt.plot(horizontal_x, y_krypton, color = '#830000', linewidth = 1)
fig = plt.plot(vertical_mean, vertical_y, color = '#0011ed')
fig = plt.xlabel('Distance in mm')
fig = plt.ylabel('Current in nA')
fig = plt.title('Intensity Profile for '+gas+' laser | Z = '+str(z)+'cm')
fig = plt.scatter(mu, amp, s = 25, color = '#0011ed')
fig = plt.scatter(krypton_x1, krypton_y, s = 25, color = '#830000')
fig = plt.scatter(krypton_x2, krypton_y, s = 25, color = '#830000')
plt.annotate('('+"{:.4f}".format(mu)+','+"{:.4f}".format(amp)+')', (mu, amp), xytext = (mu+0.002,amp+0.5))
plt.annotate('('+"{:.4f}".format(krypton_x1)+','+"{:.4f}".format(krypton_y)+')', (krypton_x1, krypton_y), xytext = (krypton_x1+0.002,krypton_y+0.5))
plt.annotate('('+"{:.4f}".format(krypton_x2)+','+"{:.4f}".format(krypton_y)+')', (krypton_x2, krypton_y), xytext = (krypton_x2+0.002,krypton_y+0.5))
plt.legend()
plt.margins(0)
plt.show()
I am also adding two images, the correct fit and the wrong fit.
In order to make clear the difficulty we will use an elementary regression method.
We see that the fitting involves ln(y) which is infinite at the points k<6 and k>16. Those points cannot be used for the numerical calculus. Also the point k=16 is not reliable because the small value of y=0.001 is not accurate enough (only one sigificative digit). So, we use only the points from k=6 to k=15 in the next calculus.
This shows that the non-significative points have to be eliminated. Of course more sophisticated methods implemented in nonlinear regression package with iterative calculus gives better fitting according to some particular criteria of fitting specified in the software.

How to plot KNN decision boundary in Python from scratch?

I need to plot the decision boundary for KNN without using sklearn. I have implemented the classifier but I am not able to plot the decision boundary. The plot should be as described in the book ElemStatLearn "The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie & Robert Tibshirani& Jerome Friedman. The plot required is shown below:
KNN k=15 classifier Original
So, far I have been able to plot only the image below:
KNN k=15 classifier Plot produced so far
I have calculated the grid points and the predictions on those points. I also tried to find the points on the boundary if the predictions don't match with the prediction on the previous grid point and sorted the points. But when I plot the points, they don't look like the one that is required.
def get_grid(X):
# Creating grids for decision surface
## Define bounds of the surface
min1, max1 = X[:, 0].min() - 0.2, X[:, 0].max() + 0.2
min2, max2 = X[:, 1].min() - 0.2, X[:, 1].max() + 0.2
## Define the x and y points
x1grid = arange(min1, max1, 0.1)
x2grid = arange(min2, max2, 0.1)
## Create all of the lines and rows of the grid
xx, yy = meshgrid(x1grid, x2grid)
## Flatten each grid to a vector
r1, r2 = xx.flatten(), yy.flatten()
r1, r2 = r1.reshape((len(r1), 1)), r2.reshape((len(r2), 1))
## Horizontally stack vectors to create x1, x2 input for the model
grid_X = hstack((r1, r2))
return grid_X
X, y = data[:, :-1], data[:, -1].astype(int)
# Custom class defined
model = KNNClassifier(num_neighbors = 5)
model.fit(X, y)
y_pred = model.predict(X)
grid_X = get_grid(X)
grid_yhat = model.predict(grid_X)
boundary = []
for i in range(1, len(grid_X)):
if grid_yhat[i] != grid_yhat[i-1]:
boundary.append((grid_X[i] + grid_X[i-1]) * 0.5)
boundary_x = [b[0] for b in boundary]
boundary_y = [b[1] for b in boundary]
order = np.argsort(boundary_x)
boundary_x = np.array(boundary_x)[order]
boundary_y = np.array(boundary_y)[order]
def plot_decision_surface(X, y, boundary_X, boundary_y, grid_X, grid_yhat):
figure(figsize=(10,10))
axis('off')
# Plot the ground truth data points in the 2D feature space
X_pos, X_neg = split_X(X, y)
scatter(X_pos[:, 0], X_pos[:, 1], facecolors='none', edgecolors='orange', marker='o', linewidth=3, s=60)
scatter(X_neg[:, 0], X_neg[:, 1], facecolors='none', edgecolors='blue', marker='o', linewidth=3, s=60)
grid_pos, grid_neg = split_X(grid_X, grid_yhat)
# Plot and color the grid of x, y values with class
scatter(grid_pos[:, 0], grid_pos[:, 1], color='orange', marker='.', linewidth=0.05)
scatter(grid_neg[:, 0], grid_neg[:, 1], color='blue', marker='.', linewidth=0.05)
# Plot the decision boundary for the classification
scatter(boundary_X, boundary_y, color='k')
plot(boundary_X, boundary_y, color='k')
# Plot Info
show()
plot_decision_surface(X, y, boundary_X, boundary_y, grid_X, grid_yhat)
Failed attempt to plot the boundary is shown below:
Failed attempt to plot the boundary

Plotting and Interpreting TensorFlow Results

The Problem:
I'm having trouble plotting and interpreting the results from my TensorFlow model. I've created my own CSV of [x, y, color] where there is a plot of randomly scattered dots with a clear pattern in the color formation. I'm able to enter all the data into the model and train the neural network but can't seem to put it all together. I'm a bit new to this as a hobbyist.
Essentially I want the ML algorithm to pick up the pattern from 100 datapoints and use it on a test dataset of nodes to plot an approximation of the pattern.
The Code:
LABEL_COLUMN = "Color"
LABELS=[0,1]
def get_dataset(data_url, **kwargs):
dataset = tf.data.experimental.make_csv_dataset(
data_url,
batch_size=5,
label_name=LABEL_COLUMN,
na_value="?",
num_epochs=1,
ignore_errors=True,
**kwargs)
return dataset
project_data = get_dataset(data_url)
project_test_data = get_dataset(test_data_url)
def pack(features,label):
return tf.stack(list(features.values()), axis=-1), label
packed_data = project_data.map(pack)
packed_test_data = project_test_data.map(pack)
model2 = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1),
])
model2.compile(
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer = "adam",
metrics = ["accuracy"]
)
model2.fit(packed_data, epochs=100)
model_output = model2.predict(packed_test_data)
model_output.plot()
Gives the below error:
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
Perhaps this function can be adapted to solve your problem?
(From https://jonchar.net/notebooks/Artificial-Neural-Network-with-Keras/)
import matplotlib.pyplot as plt
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
"""
Function to plot the decision boundary and data points of a model.
Data points are colored based on their actual label.
"""
cmap = plt.get_cmap(cmap)
# Define region of interest by data limits
xmin, xmax = X[:,0].min() - 1, X[:,0].max() + 1
ymin, ymax = X[:,1].min() - 1, X[:,1].max() + 1
steps = 1000
x_span = np.linspace(xmin, xmax, steps)
y_span = np.linspace(ymin, ymax, steps)
xx, yy = np.meshgrid(x_span, y_span)
# Make predictions across region of interest
labels = model.predict(np.c_[xx.ravel(), yy.ravel()])
# Plot decision boundary in region of interest
z = labels.reshape(xx.shape)
fig, ax = plt.subplots()
ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)
# Get predicted labels on training data and plot
train_labels = model.predict(X)
ax.scatter(X[:,0], X[:,1], c=y, cmap=cmap, lw=0)
return fig, ax
plot_decision_boundary(X, y, model, cmap='RdBu')

Accounting for noise in 2D Gaussian model

I need to fit a 2D gaussian embedded into substantial uniform noise, as shown in the left plot below. I tried using sklearn.mixture.GaussianMixture with two components (code at the bottom), but this obviously fails as shown in the right plot below.
I want to assign probabilities to each element of belonging to the 2D Gaussian and to the uniform background noise. This seems like a simple enough task but I've found no "simple" way to do it.
Any advices? It doesn't need to be GMM, I'm open to other methods/packages.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
# Generate 2D Gaussian data
N_c = 100
xy_c = np.random.normal((.5, .5), .05, (N_c, 2))
# Generate uniform noise
N_n = 1000
xy_n = np.random.uniform(.0, 1., (N_n, 2))
# Combine into a single data set
data = np.concatenate([xy_c, xy_n])
# fit a Gaussian Mixture Model with two components
model = mixture.GaussianMixture(n_components=2, covariance_type='full')
model.fit(data)
probs = model.predict_proba(data)
labels = model.predict(data)
# Separate the two clusters for plotting
msk0 = labels == 0
c0, p0 = data[msk0], probs[msk0].T[0]
msk1 = labels == 1
c1, p1 = data[msk1], probs[msk1].T[1]
# Plot
plt.subplot(121)
plt.scatter(*xy_n.T, c='b', alpha=.5)
plt.scatter(*xy_c.T, c='r', alpha=.5)
plt.xlim(0., 1.)
plt.ylim(0., 1.)
plt.subplot(122)
plt.scatter(*c0.T, c=p0, alpha=.75)
plt.scatter(*c1.T, c=p1, alpha=.75)
plt.colorbar()
# display predicted scores by the model as a contour plot
X, Y = np.meshgrid(np.linspace(0., 1.), np.linspace(0., 1.))
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -model.score_samples(XX)
Z = Z.reshape(X.shape)
plt.contour(X, Y, Z)
plt.show()
I think kernel density can help you to localize the gaussian and exclude point outside of it (e.g in area with lesser densities)
Here is an example code :
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
from sklearn.neighbors import KernelDensity
# Generate 2D Gaussian data
N_c = 100
xy_c = np.random.normal((.2, .2), .05, (N_c, 2))
# Generate uniform noise
N_n = 1000
xy_n = np.random.uniform(.0, 1., (N_n, 2))
# Combine into a single data set
data = np.concatenate([xy_c, xy_n])
print(data.shape)
model = KernelDensity(kernel='gaussian',bandwidth=0.05)
model.fit(data)
probs = model.score_samples(data)
# Plot
plt.subplot(131)
plt.scatter(*xy_n.T, c='b', alpha=.5)
plt.scatter(*xy_c.T, c='r', alpha=.5)
plt.xlim(0., 1.)
plt.ylim(0., 1.)
# plot kernel score
plt.subplot(132)
plt.scatter(*data.T, c=probs, alpha=.5)
# display predicted scores by the model as a contour plot
X, Y = np.meshgrid(np.linspace(0., 1.), np.linspace(0., 1.))
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -model.score_samples(XX)
Z = Z.reshape(X.shape)
plt.contour(X, Y, Z)
plt.xlim(0,1)
plt.ylim(0,1)
# plot kernel score with threshold
plt.subplot(133)
plt.scatter(*data.T, c=probs>0.5, alpha=.5) # here you can adjust the threshold
plt.colorbar()
plt.xlim(0,1)
plt.ylim(0,1)
And this is the output figure :
I changed the center of the gaussian to ensure my code was working. The right panel display the kernel score with a threshold, which can be use in your case to filter out the noisy data outside of the gaussian, but you can't filter the noise inside the gaussian.

plot_decision_boundary() somehow not giving any output

so I have been attempting to view the decision boundary for my network and for some reason when i run it it doesn't give me any output.
i took the function from here
it doesn't give any error, it just ends the run.
# Fit the model also history to map the model
history = model.fit(X, Y,validation_split=0.30, epochs=10, batch_size=1000, verbose= 1)
# evaluate the model
scores = model.evaluate(X, Y)
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
"""
Function to plot the decision boundary and data points of a model.
Data points are colored based on their actual label.
"""
cmap = plt.get_cmap(cmap)
# Define region of interest by data limits
xmin, xmax = X[:,0].min() - 1, X[:,0].max() + 1
ymin, ymax = X[:,1].min() - 1, X[:,1].max() + 1
steps = 1000
x_span = np.linspace(xmin, xmax, steps)
y_span = np.linspace(ymin, ymax, steps)
xx, yy = np.meshgrid(x_span, y_span)
# Make predictions across region of interest
labels = model.predict(np.c_[xx.ravel(), yy.ravel()])
# Plot decision boundary in region of interest
z = labels.reshape(xx.shape)
fig, ax = plt.subplots()
ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)
# Get predicted labels on training data and plot
train_labels = model.predict(X)
ax.scatter(X[:,0], X[:,1], c=y, cmap=cmap, lw=0)
return fig, ax
plot_decision_boundary(X, Y, model, cmap='RdBu')
i havn't really done many changes to the function.
what am i missing here?
Your function plot_decision_boundary() constructs a fig and an ax object which are returned at the end. In your code there is nothing to take up these objects when they are returned. Just because a function returns fig and ax that does not mean, they are automatically drawn.
Solution is simple, just call
plt.show()
after calling the decision boundary function.
This part is often omitted in example codes. I believe it is because there are several ways to generate the window and show the plot (you could also want to save it directly to file in which case you wouldn't need the show() statement).

Categories

Resources