Related
I have written a code to fit the gaussian function in a dataset by scipy curve_fit. There are a few different datasets. One with 19 points and one with 21 points and both of them include different datasets in range of 0.5-0.7, 1.0-1.2 and 1.5-1.7.
Surprisingly, when I ran the code in 19 point datasets, all three of them executed successfully but in case of 21 point datasets, only 1.5-1.7 ranged data had the right fit. All others were given with horribly wrong fit.
Here is the code.
#function declaration
def gauss(x, amp, mu, sigma):
y = amp*np.exp(-(x-mu)**2/(2*sigma**2))
return y
#fitting
popt, pcov = curve_fit(f = gauss, xdata = x, ydata = y)
#print(popt)
amp = popt[0]
mu = popt[1]
sigma = popt[2]
print(amp,mu,sigma)
#krypton value
krypton_y = amp/((math.exp(1))**2)
#print(krypton_y)
krypton_x1 = mu + math.sqrt((-2*(sigma**2))*math.log(krypton_y/amp))
krypton_x2 = mu - math.sqrt((-2*(sigma**2))*math.log(krypton_y/amp))
print(krypton_x1-krypton_x2)
#print(gauss([krypton_x1, krypton_x2], popt[0], popt[1], popt[2]))
#horizontal line
horizontal_x = np.arange(min(x)-0.01, max(x)+0.02, 0.01)
horizontal_y = np.repeat(0, len(horizontal_x))
#build fit set
x_test = np.arange(min(x), max(x), 0.0000001)
y_test = gauss(x_test, popt[0], popt[1], popt[2])
y_krypton = []
for i in horizontal_x:
y_krypton.append(krypton_y)
#Vertical lines
vertical_y = np.arange(-20, amp+20, 0.01)
l = len(vertical_y)
vertical_mean = np.repeat(mu, l)
#fit data
fig = plt.figure()
fig = plt.scatter(x,y, label ='original data', color = 'red', marker = 'x')
fig = plt.plot(x_test, y_test, label = 'Gaussian fit curve')
fig = plt.plot(horizontal_x, y_krypton, color = '#830000', linewidth = 1)
fig = plt.plot(vertical_mean, vertical_y, color = '#0011ed')
fig = plt.xlabel('Distance in mm')
fig = plt.ylabel('Current in nA')
fig = plt.title('Intensity Profile for '+gas+' laser | Z = '+str(z)+'cm')
fig = plt.scatter(mu, amp, s = 25, color = '#0011ed')
fig = plt.scatter(krypton_x1, krypton_y, s = 25, color = '#830000')
fig = plt.scatter(krypton_x2, krypton_y, s = 25, color = '#830000')
plt.annotate('('+"{:.4f}".format(mu)+','+"{:.4f}".format(amp)+')', (mu, amp), xytext = (mu+0.002,amp+0.5))
plt.annotate('('+"{:.4f}".format(krypton_x1)+','+"{:.4f}".format(krypton_y)+')', (krypton_x1, krypton_y), xytext = (krypton_x1+0.002,krypton_y+0.5))
plt.annotate('('+"{:.4f}".format(krypton_x2)+','+"{:.4f}".format(krypton_y)+')', (krypton_x2, krypton_y), xytext = (krypton_x2+0.002,krypton_y+0.5))
plt.legend()
plt.margins(0)
plt.show()
I am also adding two images, the correct fit and the wrong fit.
In order to make clear the difficulty we will use an elementary regression method.
We see that the fitting involves ln(y) which is infinite at the points k<6 and k>16. Those points cannot be used for the numerical calculus. Also the point k=16 is not reliable because the small value of y=0.001 is not accurate enough (only one sigificative digit). So, we use only the points from k=6 to k=15 in the next calculus.
This shows that the non-significative points have to be eliminated. Of course more sophisticated methods implemented in nonlinear regression package with iterative calculus gives better fitting according to some particular criteria of fitting specified in the software.
I'm training a KNN model and I want to plot 2 images per for loop, as shown in the imagen below:
What I need
At the left, I plot the boundary visualization of my model for a certain amoung of neighbours. At the right, I plot the confusion matrix.
To accomplish something along those lines I've written the following code:
fig = plt.figure()
for i in range(1,3):
neigh = KNeighborsClassifier(n_neighbors=i)
neigh.fit(X, y)
y_pred = neigh.predict(X)
acc = accuracy_score(y_pred,y)
# Boundary
ax1 = fig.add_subplot(1,2,1)
visualize_classifier(neigh, X, y, ax=ax1) # Defined by me
# Plot confusion matrix. Defined by sklearn.metrics
ax2 = fig.add_subplot(1,2,2)
plot_confusion_matrix(neigh, X, y, cmap=plt.cm.Blues, values_format = '.0f',ax=ax2)
ax1.set_title(f'Neighbors = {i}.\nAccuracy = {acc:.4f}',
fontsize = 14)
ax2.set_title(f'Neighbors = {i}.\nAccuracy = {acc:.4f}',
fontsize = 14)
plt.tight_layout()
plt.figure(i)
plt.show()
The visualize_classifier() function:
def visualize_classifier(model, X, y, ax=None, cmap='Dark2'):
ax = ax or plt.gca()
# Plot the training points
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], c=y, s=30, cmap=cmap, # Changed to iloc.
clim=(y.min(), y.max()), zorder=3, alpha = 0.5)
ax.axis('tight')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
# ax.axis('off')
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(*xlim, num=200),
np.linspace(*ylim, num=200))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
# Create a color plot with the results
n_classes = len(np.unique(y))
contours = ax.contourf(xx, yy, Z, alpha=0.3,
levels=np.arange(n_classes + 1) - 0.5,
cmap=cmap, clim=(y.min(), y.max()),
zorder=1)
ax.set(xlim=xlim, ylim=ylim)
What I get
What I get. Continues...
As you can see, only the first loop is plotted. the second one is not plotted and I can't figure out why.
Furthermore, I have the same title for the plot at the right and at the left. I would like to have only one on top of both, how can this be accomplished?
Now, you might be wondering why do I need to do this and the answer is that I would like to see how the boundaries change depending on the number of neighbors. It's just to get a visual sense of KNN algorithm.
Any suggestion would be pretty much appreciated.
I was able to make it work. What I had wrong was the first line inside the for loop. I assigned plt.figure(i, figsize=(18, 8)) to the variable fig.
for i in range(1,30):
fig = plt.figure(i, figsize=(18, 8))
sns.set(font_scale=2.0) # Adjust to fit
neigh = KNeighborsClassifier(n_neighbors=i)
neigh.fit(X, y)
y_pred = neigh.predict(X)
acc = accuracy_score(y_pred,y)
# Boundary
ax1 = fig.add_subplot(1,2,1)
visualize_classifier(neigh, X, y, ax=ax1) # Defined by me
# Plot confusion matrix. Defined by sklearn.metrics
ax2 = fig.add_subplot(1,2,2)
plot_confusion_matrix(neigh, X, y, cmap=plt.cm.Blues, values_format = '.0f',ax=ax2)
fig.suptitle(f'Neighbors = {i}. Accuracy = {acc:.4f}',y=1)
plt.show()
For the title I used: fig.suptitle(f'Neighbors = {i}. Accuracy = {acc:.4f}',y=1)
I have some trouble plotting the image which is in my head.
I want to visualize the Kernel-trick with Support Vector Machines. So I made some two-dimensional data consisting of two circles (an inner and an outer circle) which should be separated by a hyperplane. Obviously this isn't possible in two dimensions - so I transformed them into 3D. Let n be the number of samples. Now I have an (n,3)-array (3 columns, n rows) X of data points and an (n,1)-array y with labels. Using sklearn I get the linear classifier via
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(X, y)
I already plot the data points as scatter plot via
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
Now I want to plot the separating hyperplane as surface plot. My problem here is the missing explicit representation of the hyperplane because the decision function only yields an implicit hyperplane via decision_function = 0. Therefore I need to plot the level set (of level 0) of an 4-dimensional object.
Since I'm not a python expert I would appreciate if somebody could help me out! And I know that this isn't really the "style" of using a SVM but I need this image as an illustration for my thesis.
Edit: my current "code"
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs, make_circles
from tikzplotlib import save as tikz_save
plt.close('all')
# we create 50 separable points
#X, y = make_blobs(n_samples=40, centers=2, random_state=6)
X, y = make_circles(n_samples=50, factor=0.5, random_state=4, noise=.05)
X2, y2 = make_circles(n_samples=50, factor=0.2, random_state=5, noise=.08)
X = np.append(X,X2, axis=0)
y = np.append(y,y2, axis=0)
# shifte X to [0,2]x[0,2]
X = np.array([[item[0] + 1, item[1] + 1] for item in X])
X[X<0] = 0.01
clf = svm.SVC(kernel='rbf', C=1000)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
################## KERNEL TRICK - 3D ##################
trans_X = np.array([[item[0]**2, item[1]**2, np.sqrt(2*item[0]*item[1])] for item in X])
fig = plt.figure()
ax = plt.axes(projection ="3d")
# creating scatter plot
ax.scatter3D(trans_X[:,0],trans_X[:,1],trans_X[:,2], c = y, cmap=plt.cm.Paired)
clf2 = svm.SVC(kernel='linear', C=1000)
clf2.fit(trans_X, y)
ax = plt.gca(projection='3d')
xlim = ax.get_xlim()
ylim = ax.get_ylim()
zlim = ax.get_zlim()
### from here i don't know what to do ###
xx = np.linspace(xlim[0], xlim[1], 3)
yy = np.linspace(ylim[0], ylim[1], 3)
zz = np.linspace(zlim[0], zlim[1], 3)
ZZ, YY, XX = np.meshgrid(zz, yy, xx)
xyz = np.vstack([XX.ravel(), YY.ravel(), ZZ.ravel()]).T
Z = clf2.decision_function(xyz).reshape(XX.shape)
#ax.contour(XX, YY, ZZ, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
Desired Output
I want to get something like that.
In general I want to reconstruct what they do in this article, especially "Non-linear transformations".
Part of your question is addressed in this question on linear-kernel SVM. It's a partial answer, because only linear kernels can be represented this way, i.e. thanks to hyperplane coordinates accessible via the estimator when using linear kernel.
Another solution is to find the isosurface with marching_cubes
This solution involves installing the scikit-image toolkit (https://scikit-image.org) which allows to find an isosurface of a given value (here, I considered 0 since it represents the distance to the hyperplane) from the mesh grid of the 3D coordinates.
In the code below (copied from yours), I implement the idea for any kernel (in the example, I used the RBF kernel), and the output is shown beneath the code. Please consider my footnote about 3D plotting with matplotlib, which may be another issue in your case.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from skimage import measure
from sklearn.datasets import make_blobs, make_circles
from tikzplotlib import save as tikz_save
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
plt.close('all')
# we create 50 separable points
#X, y = make_blobs(n_samples=40, centers=2, random_state=6)
X, y = make_circles(n_samples=50, factor=0.5, random_state=4, noise=.05)
X2, y2 = make_circles(n_samples=50, factor=0.2, random_state=5, noise=.08)
X = np.append(X,X2, axis=0)
y = np.append(y,y2, axis=0)
# shifte X to [0,2]x[0,2]
X = np.array([[item[0] + 1, item[1] + 1] for item in X])
X[X<0] = 0.01
clf = svm.SVC(kernel='rbf', C=1000)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--','-','--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
################## KERNEL TRICK - 3D ##################
trans_X = np.array([[item[0]**2, item[1]**2, np.sqrt(2*item[0]*item[1])] for item in X])
fig = plt.figure()
ax = plt.axes(projection ="3d")
# creating scatter plot
ax.scatter3D(trans_X[:,0],trans_X[:,1],trans_X[:,2], c = y, cmap=plt.cm.Paired)
clf2 = svm.SVC(kernel='rbf', C=1000)
clf2.fit(trans_X, y)
z = lambda x,y: (-clf2.intercept_[0]-clf2.coef_[0][0]*x-clf2.coef_[0][1]*y) / clf2.coef_[0][2]
ax = plt.gca(projection='3d')
xlim = ax.get_xlim()
ylim = ax.get_ylim()
zlim = ax.get_zlim()
### from here i don't know what to do ###
xx = np.linspace(xlim[0], xlim[1], 50)
yy = np.linspace(ylim[0], ylim[1], 50)
zz = np.linspace(zlim[0], zlim[1], 50)
XX ,YY, ZZ = np.meshgrid(xx, yy, zz)
xyz = np.vstack([XX.ravel(), YY.ravel(), ZZ.ravel()]).T
Z = clf2.decision_function(xyz).reshape(XX.shape)
# find isosurface with marching cubes
dx = xx[1] - xx[0]
dy = yy[1] - yy[0]
dz = zz[1] - zz[0]
verts, faces, _, _ = measure.marching_cubes_lewiner(Z, 0, spacing=(1, 1, 1), step_size=2)
verts *= np.array([dx, dy, dz])
verts -= np.array([xlim[0], ylim[0], zlim[0]])
# add as Poly3DCollection
mesh = Poly3DCollection(verts[faces])
mesh.set_facecolor('g')
mesh.set_edgecolor('none')
mesh.set_alpha(0.3)
ax.add_collection3d(mesh)
ax.view_init(20, -45)
plt.savefig('kerneltrick')
Running the code produces the following image with Matplotlib, where the green semi-transparent surface represents the non-linear decision boundary.
Footnote: 3D plotting with matplotlib
Note that Matplotlib 3D is not able to manage the "depth" of objects in some cases, because it can be in conflict with the zorder of this object. This is the reason why sometimes the hyperplane look to be plotted "on top of" the points, even it should be "behind". This issue is a known bug discussed in the matplotlib 3d documentation and in this answer.
If you want to have better rendering results, you may want to use Mayavi, as recommended by the Matplotlib developers, or any other 3D Python plotting library.
The Problem:
I'm having trouble plotting and interpreting the results from my TensorFlow model. I've created my own CSV of [x, y, color] where there is a plot of randomly scattered dots with a clear pattern in the color formation. I'm able to enter all the data into the model and train the neural network but can't seem to put it all together. I'm a bit new to this as a hobbyist.
Essentially I want the ML algorithm to pick up the pattern from 100 datapoints and use it on a test dataset of nodes to plot an approximation of the pattern.
The Code:
LABEL_COLUMN = "Color"
LABELS=[0,1]
def get_dataset(data_url, **kwargs):
dataset = tf.data.experimental.make_csv_dataset(
data_url,
batch_size=5,
label_name=LABEL_COLUMN,
na_value="?",
num_epochs=1,
ignore_errors=True,
**kwargs)
return dataset
project_data = get_dataset(data_url)
project_test_data = get_dataset(test_data_url)
def pack(features,label):
return tf.stack(list(features.values()), axis=-1), label
packed_data = project_data.map(pack)
packed_test_data = project_test_data.map(pack)
model2 = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1),
])
model2.compile(
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer = "adam",
metrics = ["accuracy"]
)
model2.fit(packed_data, epochs=100)
model_output = model2.predict(packed_test_data)
model_output.plot()
Gives the below error:
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
Perhaps this function can be adapted to solve your problem?
(From https://jonchar.net/notebooks/Artificial-Neural-Network-with-Keras/)
import matplotlib.pyplot as plt
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
"""
Function to plot the decision boundary and data points of a model.
Data points are colored based on their actual label.
"""
cmap = plt.get_cmap(cmap)
# Define region of interest by data limits
xmin, xmax = X[:,0].min() - 1, X[:,0].max() + 1
ymin, ymax = X[:,1].min() - 1, X[:,1].max() + 1
steps = 1000
x_span = np.linspace(xmin, xmax, steps)
y_span = np.linspace(ymin, ymax, steps)
xx, yy = np.meshgrid(x_span, y_span)
# Make predictions across region of interest
labels = model.predict(np.c_[xx.ravel(), yy.ravel()])
# Plot decision boundary in region of interest
z = labels.reshape(xx.shape)
fig, ax = plt.subplots()
ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)
# Get predicted labels on training data and plot
train_labels = model.predict(X)
ax.scatter(X[:,0], X[:,1], c=y, cmap=cmap, lw=0)
return fig, ax
plot_decision_boundary(X, y, model, cmap='RdBu')
I am trying to plot a decision plot boundary of model prediction by Keras. However, the boundary that is generated seems incorrect.
Here's my model
def base():
model = Sequential()
model.add(Dense(5,activation = 'relu', input_dim = 2))
model.add(Dense(2,activation = 'relu'))
model.add(Dense(1,activation = 'sigmoid'))
model.compile(optimizer = optimizers.SGD(lr=0.0007, momentum=0.0, decay=0.0), loss = 'binary_crossentropy', metrics= ['accuracy'])
return model
model = base()
history = model.fit(train_X,train_Y, epochs = 10000, batch_size =64, verbose = 2)
And here's my plot function (taken from here)
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
"""
Function to plot the decision boundary and data points of a model.
Data points are colored based on their actual label.
"""
cmap = get_cmap(cmap)
# Define region of interest by data limits
xmin, xmax = X[:,0].min() - 1, X[:,0].max() + 1
ymin, ymax = X[:,1].min() - 1, X[:,1].max() + 1
steps = 1000
x_span = linspace(xmin, xmax, steps)
y_span = linspace(ymin, ymax, steps)
xx, yy = meshgrid(x_span, y_span)
# Make predictions across region of interest
labels = model.predict(c_[xx.ravel(), yy.ravel()])
# Plot decision boundary in region of interest
z = labels.reshape(xx.shape)
fig, ax = subplots()
ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)
# Get predicted labels on training data and plot
train_labels = model.predict(X)
ax.scatter(X[:,0], X[:,1], c=y.ravel(), cmap=cmap, lw=0)
return fig, ax
plot_decision_boundary(train_X,train_Y, model, cmap = 'RdBu')
And I get a plot like this
Which is obviously a very flawed depiction of a plot decision boundary (not informative at all due to the presence of so many boundaries). Can somebody point the error in my case?
Since probability is a continuous value from 0 to 1, we are getting many contours.
If your visualization is restricted to 2 classes (output is 2D softmax vector) you can use this simple code
def plot_model_out(x,y,model):
"""
x,y: 2D MeshGrid input
model: Keras Model API Object
"""
grid = np.stack((x,y))
grid = grid.T.reshape(-1,2)
outs = model.predict(grid)
y1 = outs.T[0].reshape(x.shape[0],x.shape[0])
plt.contourf(x,y,y1)
plt.show()
This will give contours (more than one), if you want a single contour line you can do the following
You can threshold the probability output from model.predict and display a single contour line.
For Example,
import numpy as np
from matplotlib import pyplot as plt
a = np.linspace(-5, 5, 100)
xx, yy = np.meshgrid(a,a)
z = xx**2 + yy**2
# z = z > 5 (Threshold value)
plt.contourf(xx, yy, z,)
plt.show()
With threshold value commented and not commented we get 2 images
Multiple contours due to continuous values
Single contour as the z is thresholded (z = z > 5)
A similar method can be used on the output softmax vector like this
label = label > 0.5
For more information regarding visualization codes refer IITM CVI Blog