Bad K-means with Gradient Descent using TensorFlow

Bad K-means with Gradient Descent using TensorFlow - python

Currently learning TensorFlow I'm working to implement kmeans clustering using TensorFlow. I am following a tutorial on TensorFlow which first introduce kmeans then introduce Gradient Descent Optimization
We first generate samples
def create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed):
np.random.seed(seed)
slices = []
centroids = []
# Create samples for each cluster
for i in range(n_clusters):
samples = tf.random_normal((n_samples_per_cluster, n_features),
mean=0.0, stddev=5.0, dtype=tf.float32, seed=seed, name="cluster_{}".format(i))
current_centroid = (np.random.random((1, n_features)) * embiggen_factor) - (embiggen_factor/2)
centroids.append(current_centroid)
samples += current_centroid
slices.append(samples)
# Create a big "samples" dataset
samples = tf.concat(0, slices, name='samples')
centroids = tf.concat(0, centroids, name='centroids')
return centroids, samples
then define 2 function assign & update (+ euclidian distance) as usual
def assign(data, centroids):
# Explanations here: http://learningtensorflow.com/lesson6/
expanded_vectors = tf.expand_dims(samples, 0)
expanded_centroids = tf.expand_dims(centroids, 1)
# nice trick here: use 'sub' "pairwisely" (thats why we just used "expand")
#
distances = tf.reduce_sum( tf.square(
tf.sub(expanded_vectors, expanded_centroids)), 2)
mins = tf.argmin(distances, 0)
nearest_indices = mins
return nearest_indices
def update(data, nearest_indices, n_clusters):
# Updates the centroid to be the mean of all samples associated with it.
nearest_indices = tf.to_int32(nearest_indices)
partitions = tf.dynamic_partition(samples, nearest_indices, n_clusters)
new_centroids = tf.concat(0, [tf.expand_dims(tf.reduce_mean(partition, 0), 0) for partition in partitions])
return new_centroids
def euclidian_distance(x, y):
sqd = tf.squared_difference(tf.cast(x, "float32"),tf.cast(y, "float32"))
sumsqd = tf.reduce_sum(sqd)
sqrtsumsqd = tf.sqrt(sumsqd)
return sqrtsumsqd
Then define the TensorFlow model to run:
import tensorflow as tf
import numpy as np
nclusters = 3
nsamplespercluster = 500
nfeatures = 2
embiggenfactor = 70
seed = 700
np.random.seed(seed)
ocentroids, samples = create_samples(nclusters, nsamplespercluster, nfeatures, embiggenfactor, seed)
X = tf.placeholder("float32", [nclusters*nsamplespercluster, 2])
# chosing random sample points as initial centroids.
centroids = tf.Variable([samples[i] for i in np.random.choice(range(nclusters*nsamplespercluster), nclusters)])#, [10.,10.]])
mean=0.0, stddev=150, dtype=tf.float32, seed=seed))
nearest_indices = assign(X, centroids)
new_centroids = update(X, nearest_indices, nclusters)
# Our error is defined as the square of the differences between centroid
error = euclidian_distance(centroids, new_centroids)
# The Gradient Descent Optimizer
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)
model = tf.initialize_all_variables()
with tf.Session() as session:
data = session.run(samples)
session.run(model)
epsilon = 0.08
err = float("inf")
count = 0
while err > epsilon:
_, err = session.run([train_op, error], {X: data})
print(err)
clustering = session.run(nearest_indices)
centers = session.run(centroids)
count += 1
# Plot each 100 iteration to see progress
if (count % 100) == 0:
print(count)
plt.figure()
plt.scatter(data[:,0], data[:,1], c=clustering)
plt.scatter(centers[:,0], centers[:,1], s=300, c="orange", marker="x", linewidth=5)
print("%d iterations" % count)
plt.figure()
plt.scatter(data[:,0], data[:,1], c=clustering)
plt.scatter(centers[:,0], centers[:,1], s=300, c="orange", marker="x", linewidth=5)
This is actually working (running) but the result is decieving:
After around 1600 iteration the result is so bad. I dont even figure out how some points can be "lost" (= clustered as a color they are so away from). To my mind kmeans can converge rlly fast on such case. Here it is not even converging to a good solution. Is it due to Gradient Descent? (don't see how could it be but...)
Thanks for advices!
pltrdy

Related

Python optimization of prediction of random forest regressor

I have built a random forest regressor to predict the elasticity of a certain object based on color, material, size and other features.
The model works fine and I can predict the expected elasticity given certain inputs.
Eventually, I want to be able to find the lowest elasticity with certain constraints. The inputs have limited possibilities, i.e., material can only be plastic or textile.
I would like to have a smart solution in which I don't have to brute force and try all the possible combinations and find the one with lowest elasticity. I have found that surrogate models can be used for this but I don't understand how to apply this concept to my problem. For example, what is the objective function I should optimize in my case? I thought of passing the .predict() of the random forest but I'm not sure this is the correct way.
To summarize, I'd like to have a solution that given certain conditions, tells me what should be the best set of features to have lowest elasticity. Example, I'm looking for the lowest elasticity when the object is made of plastic --> I'd like to receive the set of other features that tells me how to get lowest elasticity in that case. Or simply, what feature I should tune to improve the performance
import numpy
from scipy.optimize import minimize
import random
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=10, random_state=0)
model.fit(X_train,y_train)
material = [0,1]
size= list(range(1, 45))
color= list(range(1, 500))
def objective(x):
material= x[0]
size = x[1]
color = x[2]
return model.predict([[material,size,color]])
# initial guesses
n = 3
x0 = np.zeros(n)
x0[0] = random.choice(material)
x0[1] = random.choice(size)
x0[2] = random.choice(color)
# optimize
b = (None,None)
bnds = (b, b, b, b, b)
solution = minimize(objective, x0, method='nelder-mead',
options={'xtol': 1e-8, 'disp': True})
x = solution.x
print('Final Objective: ' + str(objective(x)))

This is one solution if I understood you correctly,
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from scipy.optimize import differential_evolution
model = None
def objective(x):
material= x[0]
size = x[1]
color = x[2]
return model.predict([[material,size,color]])
# define input data
material = np.random.choice([0,1], 10); material = np.expand_dims(material, 1)
size = np.arange(10); size = np.expand_dims(size, 1)
color = np.arange(20, 30); color = np.expand_dims(color, 1)
input = np.concatenate((material, size, color), 1) # shape = (10, 3)
# define output = elasticity between [0, 1] i.e. 0-100%
elasticity = np.array([0.51135295, 0.54830051, 0.42198349, 0.72614775, 0.55087905,
0.99819945, 0.3175208 , 0.78232872, 0.11621277, 0.32219236])
# model and minimize
model = RandomForestRegressor(n_estimators=100, random_state=0)
model.fit(input, elasticity)
limits = ((0, 1), (0, 10), (20, 30))
res = differential_evolution(objective, limits, maxiter = 10000, seed = 11111)
min_y = model.predict([res.x])[0]
print("min_elasticity ==", round(min_y, 5))
The output is minimal elasticity based on the limits
min_elasticity == 0.19029
These are random data so the RandomForestRegressor doesn't do the best job perhaps

dbscan not making sense for small amounts of points

I am playing around with a dbscan example in order to see if it will work for me. In my case, I have clusters of a few points (3-5) close together with a fairly long distance in between clusters. I have tried to replicate the situation in the following code. I figured with a low epsilon and low min_samples,this should work, but instead it is telling me that it only sees 1 group (and 20 noise points?). Am I using this incorrectly, or is dbscan not good for this type of problem. I went with dbscan instead of kmeans because I dont know beforehand exactly how many clusters there will be (1-5).
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt
# Configuration options
num_samples_total = 20
cluster_centers = [(3,3), (7,7),(7,3),(3,7),(5,5)]
num_classes = len(cluster_centers)
#epsilon = 1.0
epsilon = 1e-5
#min_samples = 13
min_samples = 2
# Generate data
X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_features = num_classes, center_box=(0, 1), cluster_std = 0.05)
np.save('./clusters.npy', X)
X = np.load('./clusters.npy')
# Compute DBSCAN
db = DBSCAN(eps=epsilon, min_samples=min_samples).fit(X)
labels = db.labels_
no_clusters = len(np.unique(labels) )
no_noise = np.sum(np.array(labels) == -1, axis=0)
print('Estimated no. of clusters: %d' % no_clusters)
print('Estimated no. of noise points: %d' % no_noise)
# Generate scatter plot for training data
colors = list(map(lambda x: '#3b4cc0' if x == 1 else '#b40426', labels)) #only set for 2 colors
plt.scatter(X[:,0], X[:,1], c=colors, marker="o", picker=True)
plt.title('Two clusters with data')
plt.xlabel('Axis X[0]')
plt.ylabel('Axis X[1]')
plt.show()

ended up going with kmeans and doing a modified elbow method:
print(__doc__)
# Author: Phil Roth <mr.phil.roth#gmail.com>
# License: BSD 3 clause
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Configuration options
num_samples_total = 20
cluster_centers = [(3,3), (7,7),(7,3),(3,7),(5,5)]
num_classes = len(cluster_centers)
#epsilon = 1.0
epsilon = 1e-5
#min_samples = 13
min_samples = 2
# Generate data
X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_features = num_classes, center_box=(0, 1), cluster_std = 0.05)
random_state = 170
#y_pred = KMeans(n_clusters=5, random_state=random_state).fit_predict(X)
#plt.scatter(X[:, 0], X[:, 1], c=y_pred)
#kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#maybe I dont have to look for an elbow, just go until the value drops below 1.
#also if I do go too far, it just means that the same shape will be shown twice.
clusterIdx = 0
inertia = 100
while inertia > 1:
clusterIdx = clusterIdx + 1
kmeans = KMeans(n_clusters=clusterIdx, random_state=0).fit(X)
inertia = kmeans.inertia_
print(inertia)
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
print(clusterIdx)
plt.show()

How to do Constrained Linear Regression - scikit learn?

I am trying to carry out linear regression subject using some constraints to get a certain prediction.
I want to make the model predicting half of the linear prediction, and the last half linear prediction near the last value in the first half using a very narrow range (using constraints) similar to a green line in figure.
The full code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
pd.options.mode.chained_assignment = None # default='warn'
data = [5.269, 5.346, 5.375, 5.482, 5.519, 5.57, 5.593999999999999, 5.627000000000001, 5.724, 5.818, 5.792999999999999, 5.817, 5.8389999999999995, 5.882000000000001, 5.92, 6.025, 6.064, 6.111000000000001, 6.1160000000000005, 6.138, 6.247000000000001, 6.279, 6.332000000000001, 6.3389999999999995, 6.3420000000000005, 6.412999999999999, 6.442, 6.519, 6.596, 6.603, 6.627999999999999, 6.76, 6.837000000000001, 6.781000000000001, 6.8260000000000005, 6.849, 6.875, 6.982, 7.018, 7.042000000000001, 7.068, 7.091, 7.204, 7.228, 7.261, 7.3420000000000005, 7.414, 7.44, 7.516, 7.542000000000001, 7.627000000000001, 7.667000000000001, 7.821000000000001, 7.792999999999999, 7.756, 7.871, 8.006, 8.078, 7.916, 7.974, 8.074, 8.119, 8.228, 7.976, 8.045, 8.312999999999999, 8.335, 8.388, 8.437999999999999, 8.456, 8.227, 8.266, 8.277999999999999, 8.289, 8.299, 8.318, 8.332, 8.34, 8.349, 8.36, 8.363999999999999, 8.368, 8.282, 8.283999999999999]
time = range(1,85,1)
x=int(0.7*len(data))
df = pd.DataFrame(list(zip(*[time, data])))
df.columns = ['time', 'data']
# print df
x=int(0.7*len(df))
train = df[:x]
valid = df[x:]
models = []
names = []
tr_x_ax = []
va_x_ax = []
pr_x_ax = []
tr_y_ax = []
va_y_ax = []
pr_y_ax = []
time_model = []
models.append(('LR', LinearRegression()))
for name, model in models:
x_train=df.iloc[:, 0][:x].values
y_train=df.iloc[:, 1][:x].values
x_valid=df.iloc[:, 0][x:].values
y_valid=df.iloc[:, 1][x:].values
model = LinearRegression()
# poly = PolynomialFeatures(5)
x_train= x_train.reshape(-1, 1)
y_train= y_train.reshape(-1, 1)
x_valid = x_valid.reshape(-1, 1)
y_valid = y_valid.reshape(-1, 1)
# model.fit(x_train,y_train)
model.fit(x_train,y_train.ravel())
# score = model.score(x_train,y_train.ravel())
# print 'score', score
preds = model.predict(x_valid)
tr_x_ax.extend(train['data'])
va_x_ax.extend(valid['data'])
pr_x_ax.extend(preds)
valid['Predictions'] = preds
valid.index = df[x:].index
train.index = df[:x].index
plt.figure(figsize=(5,5))
# plt.plot(train['data'],label='data')
# plt.plot(valid[['Close', 'Predictions']])
x = valid['data']
# print x
# plt.plot(valid['data'],label='validation')
plt.plot(valid['Predictions'],label='Predictions before',color='orange')
y =range(0,58)
y1 =range(58,84)
for index, item in enumerate(pr_x_ax):
if index >13:
pr_x_ax[index] = pr_x_ax[13]
pr_x_ax = list([float(i) for i in pr_x_ax])
va_x_ax = list([float(i) for i in va_x_ax])
tr_x_ax = list([float(i) for i in tr_x_ax])
plt.plot(y,tr_x_ax, label='train' , color='red', linewidth=2)
plt.plot(y1,va_x_ax, label='validation1' , color='blue', linewidth=2)
plt.plot(y1,pr_x_ax, label='Predictions after' , color='green', linewidth=2)
plt.xlabel("time")
plt.ylabel("data")
plt.xticks(rotation=45)
plt.legend()
plt.show()
If you see this figure:
label: Predictions before, the model predicted it without any constraints (I don't need this result).
label: Predictions after, the model predicted it within a constraint but this is after the model predicted AND the all values are equal to last value at index = 71 , item 8.56.
I used for loop for index, item in enumerate(pr_x_ax): in line:64, and the curve is line straight from time 71 to 85 sec as you see in order to show you how I need the model work.
Could I build the model give the same result instead of for loop???
Please your suggestions

I expect that in your question by drawing green line you really expect trained model to predict linear horizontal turn to the right. But current trained model draws just straight orange line.
It is true for any trained model of any algorithm and type that in order to learn some unordinary change in behavior model needs to have at least some samples of that unordinary change. Or at least some hidden meaning in observed data should point to having such unordinary change.
In other words for your model to learn that right turn on green line a model should have points with that right turn in the training data set. But you take for training data just first (leftmost) 70% of data by train = df[:int(0.7 * len(df))] and that training data has no such right turns and this training data just looks close to one straight line.
So you need to re-sample your data into training and validation in a different way - take randomly 70% of samples from whole range of X and the rest goes to validation. So that in your training data samples that do right turn also included.
Second thing is that LinearRegression model always models predictions just with one single straight line, and this line can't have right turns. In order to have right turns you need some more complex model.
One way for a model to have a right turn is to be piece-wise-linear, i.e. having several joined straight lines. I didn't find ready-made piecewise linear models inside sklearn, only using other pip models. So I decided to implement my own simple class PieceWiseLinearRegression that uses np.piecewise() and scipy.optimize.curve_fit() in order to model piecewise linear function.
Next picture shows results of applying two mentioned things above, code goes afterwards, re-sampling dataset in a different way and modeling piece-wise-linear function. Your current linear model LR still makes a prediction using just one straight blue line, while my piecewise linear PWLR2, orange line, consists of two segments and correctly predicts right turn:
To see clearly just one PWLR2 graph I did next picture too:
My class PieceWiseLinearRegression on creation of object accepts just one argument n - number of linear segments to be used for prediction. For picture above n = 2 was used.
import sys, numpy as np, pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
np.random.seed(0)
class PieceWiseLinearRegression:
#classmethod
def nargs_func(cls, f, n):
return eval('lambda ' + ', '.join([f'a{i}'for i in range(n)]) + ': f(' + ', '.join([f'a{i}'for i in range(n)]) + ')', locals())
#classmethod
def piecewise_linear(cls, n):
condlist = lambda xs, xa: [(lambda x: (
(xs[i] <= x if i > 0 else np.full_like(x, True, dtype = np.bool_)) &
(x < xs[i + 1] if i < n - 1 else np.full_like(x, True, dtype = np.bool_))
))(xa) for i in range(n)]
funclist = lambda xs, ys: [(lambda i: (
lambda x: (
(x - xs[i]) * (ys[i + 1] - ys[i]) / (
(xs[i + 1] - xs[i]) if abs(xs[i + 1] - xs[i]) > 10 ** -7 else 10 ** -7 * (-1, 1)[xs[i + 1] - xs[i] >= 0]
) + ys[i]
)
))(j) for j in range(n)]
def f(x, *pargs):
assert len(pargs) == (n + 1) * 2, (n, pargs)
xs, ys = pargs[0::2], pargs[1::2]
xa = x.ravel().astype(np.float64)
ya = np.piecewise(x = xa, condlist = condlist(xs, xa), funclist = funclist(xs, ys)).ravel()
#print('xs', xs, 'ys', ys, 'xa', xa, 'ya', ya)
return ya
return cls.nargs_func(f, 1 + (n + 1) * 2)
def __init__(self, n):
self.n = n
self.f = self.piecewise_linear(self.n)
def fit(self, x, y):
from scipy import optimize
self.p, self.e = optimize.curve_fit(self.f, x, y, p0 = [j for i in range(self.n + 1) for j in (np.amin(x) + i * (np.amax(x) - np.amin(x)) / self.n, 1)])
#print('p', self.p)
def predict(self, x):
return self.f(x, *self.p)
data = [5.269, 5.346, 5.375, 5.482, 5.519, 5.57, 5.593999999999999, 5.627000000000001, 5.724, 5.818, 5.792999999999999, 5.817, 5.8389999999999995, 5.882000000000001, 5.92, 6.025, 6.064, 6.111000000000001, 6.1160000000000005, 6.138, 6.247000000000001, 6.279, 6.332000000000001, 6.3389999999999995, 6.3420000000000005, 6.412999999999999, 6.442, 6.519, 6.596, 6.603, 6.627999999999999, 6.76, 6.837000000000001, 6.781000000000001, 6.8260000000000005, 6.849, 6.875, 6.982, 7.018, 7.042000000000001, 7.068, 7.091, 7.204, 7.228, 7.261, 7.3420000000000005, 7.414, 7.44, 7.516, 7.542000000000001, 7.627000000000001, 7.667000000000001, 7.821000000000001, 7.792999999999999, 7.756, 7.871, 8.006, 8.078, 7.916, 7.974, 8.074, 8.119, 8.228, 7.976, 8.045, 8.312999999999999, 8.335, 8.388, 8.437999999999999, 8.456, 8.227, 8.266, 8.277999999999999, 8.289, 8.299, 8.318, 8.332, 8.34, 8.349, 8.36, 8.363999999999999, 8.368, 8.282, 8.283999999999999]
time = list(range(1, 85))
df = pd.DataFrame(list(zip(time, data)), columns = ['time', 'data'])
choose_train = np.random.uniform(size = (len(df),)) < 0.8
choose_valid = ~choose_train
x_all = df.iloc[:, 0].values
y_all = df.iloc[:, 1].values
x_train = df.iloc[:, 0][choose_train].values
y_train = df.iloc[:, 1][choose_train].values
x_valid = df.iloc[:, 0][choose_valid].values
y_valid = df.iloc[:, 1][choose_valid].values
x_all_lin = np.linspace(np.amin(x_all), np.amax(x_all), 500)
models = []
models.append(('LR', LinearRegression()))
models.append(('PWLR2', PieceWiseLinearRegression(2)))
for imodel, (name, model) in enumerate(models):
model.fit(x_train[:, None], y_train)
x_all_lin_pred = model.predict(x_all_lin[:, None])
plt.plot(x_all_lin, x_all_lin_pred, label = f'pred {name}')
plt.plot(x_train, y_train, label='train')
plt.plot(x_valid, y_valid, label='valid')
plt.xlabel('time')
plt.ylabel('data')
plt.legend()
plt.show()

Python Polynomial Regression with Gradient Descent

I try to implement Polynomial Regression with Gradient Descent. I want to fit the following function:
The code I use is:
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from sklearn.preprocessing import PolynomialFeatures
np.random.seed(seed=42)
def create_data():
x = PolynomialFeatures(degree=5).fit_transform(np.linspace(-10,10,100).reshape(100,-1))
l = lambda x_i: (1/3)*x_i**3-2*x_i**2+2*x_i+2
data = l(x[:,1])
noise = np.random.normal(0,0.1,size=np.shape(data))
y = data+noise
y= y.reshape(100,1)
return {'x':x,'y':y}
def plot_function(x,y):
fig = plt.figure(figsize=(10,10))
plt.plot(x[:,1],[(1/3)*x_i**3-2*x_i**2+2*x_i+2 for x_i in x[:,1]],c='lightgreen',linewidth=3,zorder=0)
plt.scatter(x[:,1],y)
plt.show()
def w_update(y,x,batch,w_old,eta):
derivative = np.sum([(y[i]-np.dot(w_old.T,x[i,:]))*x[i,:] for i in range(np.shape(x)[0])])
print(derivative)
return w_old+eta*(1/batch)*derivative
# initialize variables
w = np.random.normal(size=(6,1))
data = create_data()
x = data['x']
y = data['y']
plot_function(x,y)
# Update w
w_s = []
Error = []
for i in range(500):
error = (1/2)*np.sum([(y[i]-np.dot(w.T,x[i,:]))**2 for i in range(len(x))])
Error.append(error)
w_prime = w_update(y,x,np.shape(x)[0],w,0.001)
w = w_prime
w_s.append(w)
# Plot the predicted function
plt.plot(x[:,1],np.dot(x,w))
plt.show()
# Plot the error
fig3 = plt.figure()
plt.scatter(range(len(Error[10:])),Error[10:])
plt.show()
But as result I receive smth. strange which is completely out of bounds...I have also tried to alter the number of iterations as well as the parameter theta but it did not help. I assume I have made an mistake in the update of w.

I have found the solution. The Problem is indeed in the part where I calculate the weights. Specifically in:
np.sum([(y[d]-np.dot(w_old.T,x[d,:]))*x[d,:] for d in range(np.shape(x)[0])])
which should be like:
np.sum([-(y[d]-np.dot(w.T.copy(),x[d,:]))*x[d,:].reshape(np.shape(w)) for d in range(len(x))],axis=0)
We have to add np.sum(axis=0) to get the dimensionality we want --> Dimensionality must be equal to w. The numpy sum documentation sais
The default, axis=None, will sum all of the elements of the input
array.
This is not what we want to achieve. Adding axis = 0 sums over the first axis of our array which is of dimensionality (100,7,1) hence the 100 elements of dimensionality (7,1) are summed up and the resulting array is of dimensionality (7,1) which is exactly what we want. Implementing this and cleaning up the code yields:
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import MinMaxScaler
np.random.seed(seed=42)
def create_data():
x = PolynomialFeatures(degree=6).fit_transform(np.linspace(-2,2,100).reshape(100,-1))
x[:,1:] = MinMaxScaler(feature_range=(-2,2),copy=False).fit_transform(x[:,1:])
l = lambda x_i: np.cos(0.8*np.pi*x_i)
data = l(x[:,1])
noise = np.random.normal(0,0.1,size=np.shape(data))
y = data+noise
y= y.reshape(100,1)
# Normalize Data
return {'x':x,'y':y}
def plot_function(x,y,w,Error,w_s):
fig,ax = plt.subplots(nrows=1,ncols=2,figsize=(40,10))
ax[0].plot(x[:,1],[np.cos(0.8*np.pi*x_i) for x_i in x[:,1]],c='lightgreen',linewidth=3,zorder=0)
ax[0].scatter(x[:,1],y)
ax[0].plot(x[:,1],np.dot(x,w))
ax[0].set_title('Function')
ax[1].scatter(range(iterations),Error)
ax[1].set_title('Error')
plt.show()
# initialize variables
data = create_data()
x = data['x']
y = data['y']
w = np.random.normal(size=(np.shape(x)[1],1))
eta = 0.1
iterations = 10000
batch = 10
def stochastic_gradient_descent(x,y,w,eta):
derivative = -(y-np.dot(w.T,x))*x.reshape(np.shape(w))
return eta*derivative
def batch_gradient_descent(x,y,w,eta):
derivative = np.sum([-(y[d]-np.dot(w.T.copy(),x[d,:]))*x[d,:].reshape(np.shape(w)) for d in range(len(x))],axis=0)
return eta*(1/len(x))*derivative
def mini_batch_gradient_descent(x,y,w,eta,batch):
gradient_sum = np.zeros(shape=np.shape(w))
for b in range(batch):
choice = np.random.choice(list(range(len(x))))
gradient_sum += -(y[choice]-np.dot(w.T,x[choice,:]))*x[choice,:].reshape(np.shape(w))
return eta*(1/batch)*gradient_sum
# Update w
w_s = []
Error = []
for i in range(iterations):
# Calculate error
error = (1/2)*np.sum([(y[i]-np.dot(w.T,x[i,:]))**2 for i in range(len(x))])
Error.append(error)
# Stochastic Gradient Descent
"""
for d in range(len(x)):
w-= stochastic_gradient_descent(x[d,:],y[d],w,eta)
w_s.append(w.copy())
"""
# Minibatch Gradient Descent
"""
w-= mini_batch_gradient_descent(x,y,w,eta,batch)
"""
# Batch Gradient Descent
w -= batch_gradient_descent(x,y,w,eta)
# Show predicted weights
print(w_s)
# Plot the predicted function and the Error
plot_function(x,y,w,Error,w_s)
As result we receive:
Which surely can be improved by altering eta and the number of iterations as well as switching to Stochastic or Mini Batch Gradient Descent or more sophisticated optimization algorithms.

Animate Self Organizing Map in Tensorflow

I found this very helpful blog for the implementation of self organizing maps using tensorflow. I tried running the scikit learn iris data set on it and I get the result see image below. To see how the SOM evolves I would like to animate my graph and here is where I got stuck. I found some basic example for animation:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig2 = plt.figure()
x = np.arange(-9, 10)
y = np.arange(-9, 10).reshape(-1, 1)
base = np.hypot(x, y)
ims = []
for add in np.arange(15):
ims.append((plt.pcolor(x, y, base + add, norm=plt.Normalize(0, 30)),))
im_ani = animation.ArtistAnimation(fig2, ims, interval=50, repeat_delay=3000, blit=True)
plt.show()
To animate I must edit the train function of som.py because the training for loop is encapsulated there. It looks like this:
def train(self, input_vects):
"""
Trains the SOM.
'input_vects' should be an iterable of 1-D NumPy arrays with
dimensionality as provided during initialization of this SOM.
Current weightage vectors for all neurons(initially random) are
taken as starting conditions for training.
"""
#fig2 = plt.figure()
#Training iterations
for iter_no in tqdm(range(self._n_iterations)):
#Train with each vector one by one
for input_vect in input_vects:
self._sess.run(self._training_op,
feed_dict={self._vect_input: input_vect,
self._iter_input: iter_no})
#Store a centroid grid for easy retrieval later on
centroid_grid = [[] for i in range(self._m)]
self._weightages = list(self._sess.run(self._weightage_vects))
self._locations = list(self._sess.run(self._location_vects))
for i, loc in enumerate(self._locations):
centroid_grid[loc[0]].append(self._weightages[i])
#im_ani = animation.ArtistAnimation(fig2, centroid_grid, interval=50, repeat_delay=3000, blit=True)
self._centroid_grid = centroid_grid
self._trained = True
#plt.show()
The comments are my try to implement the animation but it doesn't work because in the basic example the ims list is a matplotlib object and in the training function the list is a 4d numpy array.
To sum it up how can I animate my plot? Thanks for your help in advance.
Here is my full code:
som.py
import tensorflow as tf
import numpy as np
from tqdm import tqdm
import matplotlib.animation as animation
from matplotlib import pyplot as plt
import time
class SOM(object):
"""
2-D Self-Organizing Map with Gaussian Neighbourhood function
and linearly decreasing learning rate.
"""
#To check if the SOM has been trained
_trained = False
def __init__(self, m, n, dim, n_iterations=100, alpha=None, sigma=None):
"""
Initializes all necessary components of the TensorFlow
Graph.
m X n are the dimensions of the SOM. 'n_iterations' should
should be an integer denoting the number of iterations undergone
while training.
'dim' is the dimensionality of the training inputs.
'alpha' is a number denoting the initial time(iteration no)-based
learning rate. Default value is 0.3
'sigma' is the the initial neighbourhood value, denoting
the radius of influence of the BMU while training. By default, its
taken to be half of max(m, n).
"""
#Assign required variables first
self._m = m
self._n = n
if alpha is None:
alpha = 0.3
else:
alpha = float(alpha)
if sigma is None:
sigma = max(m, n) / 2.0
else:
sigma = float(sigma)
self._n_iterations = abs(int(n_iterations))
##INITIALIZE GRAPH
self._graph = tf.Graph()
##POPULATE GRAPH WITH NECESSARY COMPONENTS
with self._graph.as_default():
##VARIABLES AND CONSTANT OPS FOR DATA STORAGE
#Randomly initialized weightage vectors for all neurons,
#stored together as a matrix Variable of size [m*n, dim]
self._weightage_vects = tf.Variable(tf.random_normal(
[m*n, dim]))
#Matrix of size [m*n, 2] for SOM grid locations
#of neurons
self._location_vects = tf.constant(np.array(
list(self._neuron_locations(m, n))))
##PLACEHOLDERS FOR TRAINING INPUTS
#We need to assign them as attributes to self, since they
#will be fed in during training
#The training vector
self._vect_input = tf.placeholder("float", [dim])
#Iteration number
self._iter_input = tf.placeholder("float")
##CONSTRUCT TRAINING OP PIECE BY PIECE
#Only the final, 'root' training op needs to be assigned as
#an attribute to self, since all the rest will be executed
#automatically during training
#To compute the Best Matching Unit given a vector
#Basically calculates the Euclidean distance between every
#neuron's weightage vector and the input, and returns the
#index of the neuron which gives the least value
bmu_index = tf.argmin(tf.sqrt(tf.reduce_sum(
tf.pow(tf.subtract(self._weightage_vects, tf.stack([self._vect_input for i in range(m*n)])), 2), 1)), 0)
#This will extract the location of the BMU based on the BMU's
#index
slice_input = tf.pad(tf.reshape(bmu_index, [1]),
np.array([[0, 1]]))
bmu_loc = tf.reshape(tf.slice(self._location_vects, slice_input,
tf.constant(np.array([1, 2]))),
[2])
#To compute the alpha and sigma values based on iteration
#number
learning_rate_op = tf.subtract(1.0, tf.div(self._iter_input,
self._n_iterations))
_alpha_op = tf.multiply(alpha, learning_rate_op)
_sigma_op = tf.multiply(sigma, learning_rate_op)
#Construct the op that will generate a vector with learning
#rates for all neurons, based on iteration number and location
#wrt BMU.
bmu_distance_squares = tf.reduce_sum(tf.pow(tf.subtract(
self._location_vects, tf.stack(
[bmu_loc for i in range(m*n)])), 2), 1)
neighbourhood_func = tf.exp(tf.negative(tf.div(tf.cast(
bmu_distance_squares, "float32"), tf.pow(_sigma_op, 2))))
learning_rate_op = tf.multiply(_alpha_op, neighbourhood_func)
#Finally, the op that will use learning_rate_op to update
#the weightage vectors of all neurons based on a particular
#input
learning_rate_multiplier = tf.stack([tf.tile(tf.slice(
learning_rate_op, np.array([i]), np.array([1])), [dim])
for i in range(m*n)])
weightage_delta = tf.multiply(
learning_rate_multiplier,
tf.subtract(tf.stack([self._vect_input for i in range(m*n)]),
self._weightage_vects))
new_weightages_op = tf.add(self._weightage_vects,
weightage_delta)
self._training_op = tf.assign(self._weightage_vects,
new_weightages_op)
##INITIALIZE SESSION
self._sess = tf.Session()
##INITIALIZE VARIABLES
init_op = tf.global_variables_initializer()
self._sess.run(init_op)
def _neuron_locations(self, m, n):
"""
Yields one by one the 2-D locations of the individual neurons
in the SOM.
"""
#Nested iterations over both dimensions
#to generate all 2-D locations in the map
for i in range(m):
for j in range(n):
yield np.array([i, j])
def train(self, input_vects):
"""
Trains the SOM.
'input_vects' should be an iterable of 1-D NumPy arrays with
dimensionality as provided during initialization of this SOM.
Current weightage vectors for all neurons(initially random) are
taken as starting conditions for training.
"""
#fig2 = plt.figure()
#Training iterations
for iter_no in tqdm(range(self._n_iterations)):
#Train with each vector one by one
for input_vect in input_vects:
self._sess.run(self._training_op,
feed_dict={self._vect_input: input_vect,
self._iter_input: iter_no})
#Store a centroid grid for easy retrieval later on
centroid_grid = [[] for i in range(self._m)]
self._weightages = list(self._sess.run(self._weightage_vects))
self._locations = list(self._sess.run(self._location_vects))
for i, loc in enumerate(self._locations):
centroid_grid[loc[0]].append(self._weightages[i])
#im_ani = animation.ArtistAnimation(fig2, centroid_grid, interval=50, repeat_delay=3000, blit=True)
self._centroid_grid = centroid_grid
#print(centroid_grid)
self._trained = True
#plt.show()
def get_centroids(self):
"""
Returns a list of 'm' lists, with each inner list containing
the 'n' corresponding centroid locations as 1-D NumPy arrays.
"""
if not self._trained:
raise ValueError("SOM not trained yet")
return self._centroid_grid
def map_vects(self, input_vects):
"""
Maps each input vector to the relevant neuron in the SOM
grid.
'input_vects' should be an iterable of 1-D NumPy arrays with
dimensionality as provided during initialization of this SOM.
Returns a list of 1-D NumPy arrays containing (row, column)
info for each input vector(in the same order), corresponding
to mapped neuron.
"""
if not self._trained:
raise ValueError("SOM not trained yet")
to_return = [self._locations[min([i for i in range(len(self._weightages))],
key=lambda x: np.linalg.norm(vect-self._weightages[x]))] for vect in input_vects]
return to_return
usage.py
from matplotlib import pyplot as plt
import matplotlib.animation as animation
import numpy as np
from som import SOM
from sklearn.datasets import load_iris
data = load_iris()
flower_data = data['data']
normed_flower_data = flower_data / flower_data.max(axis=0)
target_int = data['target']
target_names = data['target_names']
targets = [target_names[i] for i in target_int]
#Train a 20x30 SOM with 400 iterations
som = SOM(25, 25, 4, 100) # My parameters
som.train(normed_flower_data)
#Get output grid
image_grid = som.get_centroids()
#Map colours to their closest neurons
mapped = som.map_vects(normed_flower_data)
#Plot
plt.imshow(image_grid)
plt.title('SOM')
for i, m in enumerate(mapped):
plt.text(m[1], m[0], targets[i], ha='center', va='center',
bbox=dict(facecolor='white', alpha=0.5, lw=0))
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bad K-means with Gradient Descent using TensorFlow - python

Related

Python optimization of prediction of random forest regressor

dbscan not making sense for small amounts of points

How to do Constrained Linear Regression - scikit learn?

Python Polynomial Regression with Gradient Descent

Animate Self Organizing Map in Tensorflow

Categories

Resources