Predict with sklearn-KNN using median (instead of mean) - python

Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.
Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?

There is no built-in parameter to adjust the weighting to use the median rather than the mean (you can see in the source that the mean is hard-coded). But because scikit-learn estimators are just Python classes, you can subclass KNeighborsRegressor and override the predict method to do whatever you want.
Here's a quick example, where I've copied and pasted the original predict() method and modified the relevant piece:
from sklearn.neighbors.regression import KNeighborsRegressor, check_array, _get_weights
class MedianKNNRegressor(KNeighborsRegressor):
def predict(self, X):
X = check_array(X, accept_sparse='csr')
neigh_dist, neigh_ind = self.kneighbors(X)
weights = _get_weights(neigh_dist, self.weights)
_y = self._y
if _y.ndim == 1:
_y = _y.reshape((-1, 1))
######## Begin modification
if weights is None:
y_pred = np.median(_y[neigh_ind], axis=1)
else:
# y_pred = weighted_median(_y[neigh_ind], weights, axis=1)
raise NotImplementedError("weighted median")
######### End modification
if self._y.ndim == 1:
y_pred = y_pred.ravel()
return y_pred
X = np.random.rand(100, 1)
y = 20 * X.ravel() + np.random.rand(100)
clf = MedianKNNRegressor().fit(X, y)
print(clf.predict(X[:5]))
# [ 2.38172861 13.3871126 9.6737255 2.77561858 17.07392584]
I've left out the weighted version, because I don't know of a simple way to compute a weighted median with numpy/scipy, but it would be straightforward to add in once that function is available.

Related

Use if/else logic in tensorflow to either add an element to one tensor or another

I am building a custom loss function that needs to know whether the truth and the prediction have N pixels above a threshold. This is because the logic breaks if I supply an np.where() array which is empty. I can get around this issue by using try/else to return a 'flagged constant' in the case that the function fails on the empty set, but I'd like to do something different. Here is my current method.
def some_loss(cutoff=20, min_pix=10):
def gen_loss(y_true, y_pred):
trues = tf.map_fn(fn = lambda x: x, elems = y_true)
preds = tf.map_fn(fn = lambda x: x, elems = y_pred)
for idx in tf.range(tf.shape(y_true)[0]):
# binarize both by cutoff
true = y_true[idx]
pred = y_pred[idx]
true = tf.where(true < cutoff, 0.0, 1.0)
pred = tf.where(pred < cutoff, 0.0, 1.0)
# now I sum each to get the number of pixels above threshold
n_true, n_pred = tf.reduce_sum(true), tf.reduce_sum(pred)
# then I create a switch using tf.conditional
switch = tf.cond(tf.logical_or(n_true < min_pix, n_pred < min_pix), lambda: tf.zeros_like(true), lambda: tf.ones_like(true))
# this essentially allows me to turn off the loss if either condition is met
# so I then run the function
loss = get_loss(true, pred) # returns random constant if either is below threshold
loss += tf.reduce_sum(tf.math.multiply(loss, switch))
return loss
return gen_loss
This may work, it compiles and trains a convolutional model. However, I don't like that there are random constants wandering about my loss function, and I'd rather only operate the function get_loss() if both true and pred meet the minimum conditions.
I'd prefer to make two tensors, one with samples not meeting the condition, the other with samples meeting the condition.
Separately, I've tried to use tf.conditional to test for each case and call a separate loss function in either case. The code is repeated below.
def avgMED(scaler, cutoff=20, min_N=30,c=3):
def AVGmed(y_true, y_pred):
const = tf.constant([c],tf.float32) # constant c, multiplied by MED (
batch_size = tf.cast(tf.shape(y_true)[0], tf.float32)
MSE = tf.reduce_mean(tf.square(y_true-y_pred))
y_true = tf.reshape(y_true, shape=(tf.shape(y_true)[0], -1))
y_pred = tf.reshape(y_pred, shape=(tf.shape(y_pred)[0], -1))
loss, loss_med = tf.cast(0,dtype=tf.float32), tf.cast(0,dtype=tf.float32)
# rescale
y_true = y_true*scaler.scale_
y_true = y_true+scaler.mean_
y_pred = y_pred*scaler.scale_
y_pred = y_pred+scaler.mean_
trues = tf.map_fn(fn = lambda x: x, elems=y_true)
preds = tf.map_fn(fn = lambda x: x, elems=y_pred)
min_nonzero_pixels = tf.reduce_sum(tf.constant(min_N, dtype=tf.float32))
for idx in tf.range(batch_size):
idx = tf.cast(idx, tf.int32)
true = trues[idx]
pred = preds[idx]
MSE = tf.reduce_mean(tfm.square(tfm.subtract(true,pred)))
true = tf.where(true<cutoff,0.0,1.0)
pred = tf.where(pred<cutoff,0.0,1.0)
n_true = tf.reduce_sum(true)
n_pred = tf.reduce_sum(pred)
loss_TA = tf.cond(tf.logical_or(n_true < min_nonzero_pixels, n_pred < min_nonzero_pixels), get_zero(true,pred), get_MED(true,pred))
loss_med += loss_TA.read(0)
loss += loss_med + MSE # do we benefit from reducing across the batch dimension? we should be able to look at familiar batches and see the little increase due to the distance component
tf.print(n_true,n_pred)
tf.print(loss_med)
return loss # this is essentially MSE given c ~ 0. Thus, this will show if there are some weird gradients flowing through that are preventing the model from learning
return AVGmed
def get_MED(A,B):
# takes in binary tensors
indices_A, indices_B = tf.where(A), tf.where(B)
coordX_A_TA, coordY_A_TA = find_coord(indices_A) # finds x,y coordinates and returns tensor array
coordX_B_TA, coordY_B_TA = find_coord(indices_B)
mindists_AB_TA = find_min_distances(coordX_A_TA, coordY_A_TA, coordX_B_TA, coordY_B_TA)
mindists_BA_TA = find_min_distances(coordX_B_TA, coordY_B_TA, coordX_A_TA, coordY_A_TA)
# MED = mean error distance =
med_AB = tf.reduce_mean(mindists_AB_TA.read(0))
med_BA = tf.reduce_mean(mindists_BA_TA.read(0))
avg_med = tfm.divide(tfm.add(med_AB,med_BA),tf.constant(0.5))
loss_TA = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
loss_TA.write(loss_TA.size(), avg_med)
return loss_TA
def get_zero(A,B):
loss_TA = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
loss_TA.write(loss_TA.size(), 0)
return loss_TA
However, with this framework I am now getting new errors about my generator not having enough data, which is absurd given the batch size I test with is 10 and 1 step_per_epoch on a train size of 100. Got a warning about not closing the TensorArray, which I expect happens whether the conditional is true or false. Inching closer to a solution but could use some guidance on how problematic my tensorflow logic is.

Need help implementing a custom loss function in lightGBM (Zero-inflated Log Normal Loss)

Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.
The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:
def loglikelihood(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
Similarly, I would need to define a custom eval metric to accompany it, such as:
def binary_error(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
return 'error', np.mean(labels != (preds > 0.5)), False
Both of the above two examples are taken from the following repository:
https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.
According to the LGBM documentation for custom loss functions:
It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:
y_true: numpy 1-D array of shape = [n_samples]
The target values.
y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.
hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)
The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.
import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def softplus(beta=1, threshold=20):
return 1 / beta* math.log(1 + math.exp(beta*x))
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
term_1 = y_true * np.log(y_pred + epsilon)
return -np.mean(term_0+term_1, axis=0)
def zero_inflated_lognormal_pred(logits):
positive_probs = sigmoid(logits[..., :1])
loc = logits[..., 1:2]
scale = softplus(logits[..., 2:])
preds = (
positive_probs *
np.exp(loc + 0.5 * np.square(scale)))
return preds
def mean_abs_pct_error(preds, train_data):
labels = train_data.get_label()
decile_labels=np.percentile(labels,np.linspace(10,100,10))
decile_preds=np.percentile(preds,np.linspace(10,100,10))
MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
return 'error', MAPE, False
def zero_inflated_lognormal_loss(train_data,
logits):
labels = train_data.get_label()
positive = labels > 0
positive_logits = logits[..., :1]
classification_loss = BinaryCrossEntropy(
y_true=positive, y_pred=positive_logits)
loc = logits[..., 1:2]
scale = math.maximum(
softplus(logits[..., 2:]),
math.sqrt(epsilon))
safe_labels = positive * labels + (
1 - positive) * np.ones(labels.shape)
regression_loss = -np.mean(
positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
axis=-1)
return classification_loss + regression_loss

Can I implement an iterative training loop for the ELM classifier?

I am using the Extreme Learning Machine classifier for hand gestures recognition but I still have 20% as accuracy.Can anyone help me to implement an iterative training loop to improve the accuracy?I am a beginner and Here the code I am using:I split the dataset that I prepared into train and test parts after normalization and I train it using the train function by calculating the Moore Penrose inverse and then predict the class of each gesture using the prediction function.
# -*- coding: utf-8 -*-
"""
Created on Sat Jul 4 17:52:25 2020
#author: lenovo
"""
# -*- coding: utf-8 -*-
__author__ = 'Sarra'
import numpy as np
class ELM(object):
def __init__(self, inputSize, outputSize, hiddenSize):
"""
Initialize weight and bias between input layer and hidden layer
Parameters:
inputSize: int
The number of input layer dimensions or features in the training data
outputSize: int
The number of output layer dimensions
hiddenSize: int
The number of hidden layer dimensions
"""
self.inputSize = inputSize
self.outputSize = outputSize
self.hiddenSize = hiddenSize
# Initialize random weight with range [-0.5, 0.5]
self.weight = np.matrix(np.random.uniform(-0.5, 0.5, (self.hiddenSize, self.inputSize)))
# Initialize random bias with range [0, 1]
self.bias = np.matrix(np.random.uniform(0, 1, (1, self.hiddenSize)))
self.H = 0
self.beta = 0
def sigmoid(self, x):
"""
Sigmoid activation function
Parameters:
x: array-like or matrix
The value that the activation output will look for
Returns:
The results of activation using sigmoid function
"""
return 1 / (1 + np.exp(-1 * x))
def predict(self, X):
"""
Predict the results of the training process using test data
Parameters:
X: array-like or matrix
Test data that will be used to determine output using ELM
Returns:
Predicted results or outputs from test data
"""
X = np.matrix(X)
y = self.sigmoid((X * self.weight.T) + self.bias) * self.beta
return y
def train(self, X, y):
"""
Extreme Learning Machine training process
Parameters:
X: array-like or matrix
Training data that contains the value of each feature
y: array-like or matrix
Training data that contains the value of the target (class)
Returns:
The results of the training process
"""
X = np.matrix(X)
y = np.matrix(y)
# Calculate hidden layer output matrix (Hinit)
self.H = (X * self.weight.T) + self.bias
# Sigmoid activation function
self.H = self.sigmoid(self.H)
# Calculate the Moore-Penrose pseudoinverse matriks
H_moore_penrose = np.linalg.pinv(self.H.T * self.H) * self.H.T
# Calculate the output weight matrix beta
self.beta = H_moore_penrose * y
return self.H * self.beta
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import preprocessing
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
# read the dataset
database = pd.read_csv(r"C:\\Users\\lenovo\\tensorflow\\tensorflow1\\Numpy-ELM\\hand_gestures_database.csv")
#separate data from labels
data = database.iloc[:, 1:].values.astype('float64')
#normalize data
#n_data = preprocessing.minmax_scale(data, feature_range=(0, 1), axis=0, copy=True)
scaler = MinMaxScaler()
scaler.fit(data)
n_data = scaler.transform(data)
#identify the labels
label = database.iloc[:, 0]
#encoding labels to transform each label to a value between 0 to number of labels-1
def prepare_targets(n):
le =preprocessing.LabelEncoder()
le.fit(n)
label_enc = le.transform(n)
return label_enc
label_enc = prepare_targets(label)
CLASSES = 10
#transform the value of each label to a binary vector
target = np.zeros([label_enc.shape[0], CLASSES])
for i in range(label_enc.shape[0]):
target[i][label_enc[i]] = 1
target.view(type=np.matrix)
print("target",target)
# Create instance of ELM object with 10 hidden neuron
maxx=0
for u in range(10):
elm = ELM(45, 10, 10)
# Train test split 80:20
X_train, X_test, y_train, y_test = train_test_split(n_data, target, test_size=0.34, random_state=1)
elm.train(X_train,y_train)
y_pred = elm.predict(X_test)
# Train data
correct = 0
total = y_pred.shape[0]
for i in range(total):
predicted = np.argmax(y_pred[i])
test = np.argmax(y_test[i])
correct = correct + (1 if predicted == test else 0)
print('Accuracy: {:f}'.format(correct/total))
if(correct/total>maxx):
maxx=correct/total
print(maxx)
###confusion matrix
import seaborn as sns
y_pred=np.argmax(y_pred, axis=1)
y_true=(np.argmax(y_test, axis=1))
target_names=["G1","G2","G3","G4","G5","G6","G7","G8","G9","G10"]
cm=confusion_matrix(y_true, y_pred)
#cmn = cm.astype('float')/cm.sum(axis=1)[:, np.newaxis]*100
fig, ax = plt.subplots(figsize=(15,8))
sns.heatmap(cm/np.sum(cm), annot=True, fmt='.2f',xticklabels=target_names, yticklabels=target_names, cmap='Blues')
#sns.heatmap(cmn, annot=True, fmt='.2%', xticklabels=target_names, yticklabels=target_names)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.ylim(-0.5, len(target_names) + 0.5)
plt.show(block=False)
def perf_measure(y_actual, y_pred):
TP = 0
FP = 0
TN = 0
FN = 0
for i in range(len(y_pred)):
if y_actual[i]==y_pred[i]==1:
TP += 1
if y_pred[i]==1 and y_actual[i]!=y_pred[i]:
FP += 1
if y_actual[i]==y_pred[i]==0:
TN += 1
if y_pred[i]==0 and y_actual[i]!=y_pred[i]:
FN += 1
return(TP, FP, TN, FN)
TP, FP, TN, FN=perf_measure(y_true, y_pred)
print("precision",TP/(TP+FP))
print("sensivity",TP/(TP+FN))
print("specifity",TN/(TN+FP))
print("accuracy",(TP+TN)/(TN+FP+FN+TP))
To your question about whether you can implement an iterative training loop for an ELM:
No, you can not. An ELM consists of one random layer followed by an output layer. Because the first layer is fixed, this is essentially a linear model and we can find the optimal output weights by using the pseudo-inverse, as you pointed out.
However, since you already find the perfect solution for this model in one step, there is no direct way to iteratively improve this result.
I would, however, not advise using an extreme learning machine.
Besides the controversy about their origin, they are very limited in the functions they can learn.
There are other well-established approaches for gesture classification that are likely more useful to pursue.

Hyperparameter Tuning k-means clustering

I am trying to perform hyperparameter tuning for Spatio-Temporal K-Means clustering by using it in a pipeline with a Decision Tree classifier. The idea is to use the K-Means clustering algorithm to generate cluster-distance space matrix and clustered labels, which will then be passed to the Decision Tree classifier. For hyperparameter tuning, just use parameters for the K-Means algorithm.
I am using Python 3.8 and sklearn 0.22.
The data I am interested in having 3 columns/attributes: time, x, and y (x and y are spatial coordinates).
The code is:
class ST_KMeans():
"""
Note that the K-means clustering algorithm is designed for Euclidean distances.
It may stop converging with other distances when the mean is no longer a
best estimation for the cluster 'center'.
The 'mean' minimizes squared differences (or, squared Euclidean distance).
If you want a different distance function, you need to replace the mean with
an appropriate center estimation.
Parameters:
k: number of clusters
eps1 : float, default=0.5
The spatial density threshold (maximum spatial distance) between
two points to be considered related.
eps2 : float, default=10
The temporal threshold (maximum temporal distance) between two
points to be considered related.
metric : string default='euclidean'
The used distance metric - more options are
‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’,
‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’,
‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘rogerstanimoto’, ‘sqeuclidean’,
‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘yule’.
n_jobs : int or None, default=-1
The number of processes to start; -1 means use all processors (BE AWARE)
Attributes:
labels : array, shape = [n_samples]
Cluster labels for the data - noise is defined as -1
"""
def __init__(self, k, eps1 = 0.5, eps2 = 10, metric = 'euclidean', n_jobs = 1):
self.k = k
self.eps1 = eps1
self.eps2 = eps2
# self.min_samples = min_samples
self.metric = metric
self.n_jobs = n_jobs
def fit(self, X):
"""
Apply the ST K-Means algorithm
X : 2D numpy array. The first attribute of the array should be time attribute
as float. The following positions in the array are treated as spatial
coordinates.
The structure should look like this [[time_step1, x, y], [time_step2, x, y]..]
For example 2D dataset:
array([[0,0.45,0.43],
[0,0.54,0.34],...])
Returns:
self
"""
# check if input is correct
X = check_array(X)
# type(X)
# numpy.ndarray
# Check arguments for DBSCAN algo-
if not self.eps1 > 0.0 or not self.eps2 > 0.0:
raise ValueError('eps1, eps2, minPts must be positive')
# Get dimensions of 'X'-
# n - number of rows
# m - number of attributes/columns-
n, m = X.shape
# Compute sqaured form Euclidean Distance Matrix for 'time' and spatial attributes-
time_dist = squareform(pdist(X[:, 0].reshape(n, 1), metric = self.metric))
euc_dist = squareform(pdist(X[:, 1:], metric = self.metric))
'''
Filter the euclidean distance matrix using time distance matrix. The code snippet gets all the
indices of the 'time_dist' matrix in which the time distance is smaller than 'eps2'.
Afterward, for the same indices in the euclidean distance matrix the 'eps1' is doubled which results
in the fact that the indices are not considered during clustering - as they are bigger than 'eps1'.
'''
# filter 'euc_dist' matrix using 'time_dist' matrix-
dist = np.where(time_dist <= self.eps2, euc_dist, 2 * self.eps1)
# Initialize K-Means clustering model-
self.kmeans_clust_model = KMeans(
n_clusters = self.k, init = 'k-means++',
n_init = 10, max_iter = 300,
precompute_distances = 'auto', algorithm = 'auto')
# Train model-
self.kmeans_clust_model.fit(dist)
self.labels = self.kmeans_clust_model.labels_
self.X_transformed = self.kmeans_clust_model.fit_transform(X)
return self
def transform(self, X):
# print("\nX.shape = {0}\n".format(X.shape))
# pass
# return self.kmeans_clust_model.fit_transform(X)
return self.X_transformed
# Initialize ST-K-Means object-
st_kmeans_algo = ST_KMeans(
k = 5, eps1=0.6,
eps2=9, metric='euclidean',
n_jobs=1
)
# Train on a chunk of dataset-
st_kmeans_algo.fit(data.loc[:500, ['time', 'x', 'y']])
# Get clustered data points labels-
kmeans_labels = st_kmeans_algo.labels
kmeans_labels.shape
# (501,)
# Get labels for points clustered using trained model-
kmeans_transformed = st_kmeans_algo.X_transformed
kmeans_transformed.shape
# (501, 5)
dtc = DecisionTreeClassifier()
dtc.fit(kmeans_transformed, kmeans_labels)
y_pred = dtc.predict(kmeans_transformed)
# Get model performance metrics-
accuracy = accuracy_score(kmeans_labels, y_pred)
precision = precision_score(kmeans_labels, y_pred, average='macro')
recall = recall_score(kmeans_labels, y_pred, average='macro')
print("\nDT model metrics are:")
print("accuracy = {0:.4f}, precision = {1:.4f} & recall = {2:.4f}\n".format(
accuracy, precision, recall
))
# DT model metrics are:
# accuracy = 1.0000, precision = 1.0000 & recall = 1.0000
# Define steps of pipeline-
pipeline_steps = [
('st_kmeans_algo' ,ST_KMeans(k = 5, eps1=0.6, eps2=9, metric='euclidean', n_jobs=1)),
('dtc', DecisionTreeClassifier())
]
# Instantiate a pipeline-
pipeline = Pipeline(pipeline_steps)
# Train pipeline-
pipeline.fit(kmeans_transformed, kmeans_labels)
However the pipeline.fit() gives the following error:
> --------------------------------------------------------------------------- TypeError Traceback (most recent call
> last) <ipython-input-25-711d6dd8d926> in <module>
> ----> 1 pipeline = Pipeline(pipeline_steps)
>
> ~/.local/lib/python3.8/site-packages/sklearn/pipeline.py in
> __init__(self, steps, memory, verbose)
> 134 self.memory = memory
> 135 self.verbose = verbose
> --> 136 self._validate_steps()
> 137
> 138 def get_params(self, deep=True):
>
> ~/.local/lib/python3.8/site-packages/sklearn/pipeline.py in
> _validate_steps(self)
> 179 if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not
> 180 hasattr(t, "transform")):
> --> 181 raise TypeError("All intermediate steps should be "
> 182 "transformers and implement fit and transform "
> 183 "or be the string 'passthrough' "
>
> TypeError: All intermediate steps should be transformers and implement
> fit and transform or be the string 'passthrough' '<__main__.ST_KMeans
> object at 0x7f0971db5430>' (type <class '__main__.ST_KMeans'>) doesn't
What's going wrong?
Your error message says it all: All intermediate steps should be transformers and implement fit and transform. In your case, your class ST_KMeans() has to implement a transform function as well to be used in a pipeline. Besides, best-practice is usually to inherit from the classes BaseEstimator and TransformerMixin from the module sklearn.base:
from sklearn.base import BaseEstimator, TransformerMixin
class ST_KMeans(BaseEstimator, TransformerMixin):
def fit(self, X, y=none):
...
return self
def transform(self, X):
return self.X_transformed
Then, you can use your class in a pipeline.

Gradient descent using many polynomials is not converging

context: I am trying to create a generic function to optimize the cost of any regression problem using polynomial regression (of any specified degree).
I am trying to fit my model to the load_boston dataset (with the house price as the label and 13 features).
I used multiple degrees of polynomials, and multiple learning rates and epochs (with gradient descent) and the MSE is coming out to be so high even on the training dataset (I am using 100% of the data to train the model, and I am checking the cost on the same data, but the MSE cost is still very high).
import tensorflow as tf
from sklearn.datasets import load_boston
def polynomial(x, coeffs):
y = 0
for i in range(len(coeffs)):
y += coeffs[i]*x**i
return y
def initial_parameters(dimensions, data_type, degree): # list number of dims/features and degree
thetas = [tf.Variable(0, dtype=data_type)] # the constant theta/bias
for i in range(degree):
thetas.append(tf.Variable( tf.zeros([dimensions, 1], dtype=data_type)))
return thetas
def regression_error(x, y, thetas):
hx = thetas[0] # constant thetas - no need to have 1 for each variable (e.g x^0*th + y^0*th...)
for i in range(1, len(thetas)):
hx = tf.add(hx, tf.matmul( tf.pow(x, i), thetas[i]))
return tf.reduce_mean(tf.squared_difference(hx, y))
def polynomial_regression(x, y, data_type, degree, learning_rate, epoch): #features=dimensions=variables
thetas = initial_parameters(x.shape[1], data_type, degree)
cost = regression_error(x, y, thetas)
init = tf.initialize_all_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.Session() as sess:
sess.run(init)
for epoch in range(epoch):
sess.run(optimizer)
return cost.eval()
x, y = load_boston(True) # yes just use the entire dataset
for deg in range(1, 2):
for lr in range(-8, -5):
error = polynomial_regression(x, y, tf.float64, deg, 10**lr, 100 )
print (deg, lr, error)
It outputs 97.3 even though most of the labels are around 30 (degree = 1, learning rate = 10^-6).
what is wrong with the code?
The problem is that the different features are on different orders of magnitude and hence are not compatible with the learning rate which is the same for all features. Even more, when using a non-zero variable initialization, one has to make sure that these initial values are as well compatible with the feature values.
In [1]: from sklearn.datasets import load_boston
In [2]: x, y = load_boston(True)
In [3]: x.std(axis=0)
Out[3]:
array([8.58828355e+00, 2.32993957e+01, 6.85357058e+00, 2.53742935e-01,
1.15763115e-01, 7.01922514e-01, 2.81210326e+01, 2.10362836e+00,
8.69865112e+00, 1.68370495e+02, 2.16280519e+00, 9.12046075e+01,
7.13400164e+00])
In [4]: x.mean(axis=0)
Out[4]:
array([3.59376071e+00, 1.13636364e+01, 1.11367787e+01, 6.91699605e-02,
5.54695059e-01, 6.28463439e+00, 6.85749012e+01, 3.79504269e+00,
9.54940711e+00, 4.08237154e+02, 1.84555336e+01, 3.56674032e+02,
1.26530632e+01])
A common approach is to normalize the input data (e.g. to have zero mean and unit variance) and to choose the initial weights randomly (e.g. normal distribution, std.dev. = 1). sklearn.preprocessing offers various functionality for these cases.
PolynomialFeatures can be used to generate the polynomial features automatically.
StandardScaler scales the data to zero mean and unit variance.
pipeline.Pipeline can be used for convenience to combine these preprocessing steps.
The polynomial_regression function then reduces to:
pipeline = Pipeline([
('poly', PolynomialFeatures(degree)),
('scaler', StandardScaler())
])
x = pipeline.fit_transform(x)
thetas = tf.Variable(tf.random_normal([x.shape[1], 1], dtype=data_type))
cost = tf.reduce_mean(tf.squared_difference(tf.matmul(x, thetas), y))
# Perform variable initialization and optimizer instantiation here.
# Run optimization over epochs.

Categories

Resources