I have tried to setup XGBoost sklearn API XGBClassifier to use custom objective function (brier) according to the documentation:
.. note:: Custom objective function
A custom objective function can be provided for the ``objective``
parameter. In this case, it should have the signature
``objective(y_true, y_pred) -> grad, hess``:
y_true: array_like of shape [n_samples]
The target values
y_pred: array_like of shape [n_samples]
The predicted values
grad: array_like of shape [n_samples]
The value of the gradient for each sample point.
hess: array_like of shape [n_samples]
The value of the second derivative for each sample point
Here's my attempt:
import numpy as np
from xgboost import XGBClassifier
from sklearn.datasets import load_svmlight_file
train_data = load_svmlight_file('~/agaricus.txt.train')
X = train_data[0].toarray()
y = train_data[1]
def brier(y_true, y_pred):
y_pred = 1.0 / (1.0 + np.exp(-y_pred))
grad = 2 * y_pred * (y_true - y_pred) * (y_pred - 1)
hess = 2 * y_pred ** (1 - y_pred) * (2 * y_pred * (y_true + 1) - y_true - 3 * y_pred ** 2)
return grad, hess
m = XGBClassifier(objective=brier, seed=42)
It seemingly results in correct object:
XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None, gamma=None,
gpu_id=None, importance_type='gain', interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
objective=<function brier at 0x7fe7ac418290>, random_state=None,
reg_alpha=None, reg_lambda=None, scale_pos_weight=None, seed=42,
subsample=None, tree_method=None, validate_parameters=False,
verbosity=None)
However, calling .fit method seems to reset m object to default setup:
m.fit(X, y)
m
XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints=None,
learning_rate=0.300000012, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=0, num_parallel_tree=1,
objective='binary:logistic', random_state=42, reg_alpha=0,
reg_lambda=1, scale_pos_weight=1, seed=42, subsample=1,
tree_method=None, validate_parameters=False, verbosity=None)
with objective='binary:logistic'. I have noticed that while investigating why I am getting worse brier score when optimising directly for brier than when I use default binary:logistic, as described here.
So, how can I properly setup XGBClassifier to use my function brier as custom objective?
I believe you are mistaking objective with objective function(obj as parameter), the xgboost documentation is quite confusing sometimes.
In short to your question you just need to fix to this:
m = XGBClassifier(obj=brier, seed=42)
A bit more in depth, objective is how xgboost will optimize given an objective function. Usually xgboost infers optimize from number of classes in your y vector.
I took a snippet from the source code, as you can see whenever you have only two classes the objective is set to binary:logistic:
class XGBClassifier(XGBModel, XGBClassifierBase):
def __init__(self, objective="binary:logistic", **kwargs):
super().__init__(objective=objective, **kwargs)
def fit(self, X, y, sample_weight=None, base_margin=None,
eval_set=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, xgb_model=None,
sample_weight_eval_set=None, callbacks=None):
evals_result = {}
self.classes_ = np.unique(y)
self.n_classes_ = len(self.classes_)
xgb_options = self.get_xgb_params() # <-- obj function is set here
if callable(self.objective):
obj = _objective_decorator(self.objective) # <----- here is the mismatch of the names, if you pass objective as your brie func it will become "binary:logistic"
xgb_options["objective"] = "binary:logistic"
else:
obj = None
if self.n_classes_ > 2:
xgb_options['objective'] = 'multi:softprob' # <----- objective is being set here if n_classes> 2
xgb_options['num_class'] = self.n_classes_
+-- 35 lines: feval = eval_metric if callable(eval_metric) else None-----------------------------------------------------------------------------------------------------------------------------------------------------
self._Booster = train(xgb_options, train_dmatrix, # <----- objective is being passed in xgb_options dictionary
self.get_num_boosting_rounds(),
evals=evals,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, obj=obj, feval=feval, # <----- obj function is being passed to lower level api here
verbose_eval=verbose, xgb_model=xgb_model,
callbacks=callbacks)
+-- 12 lines: self.objective = xgb_options["objective"]------------------------------------------------------------------------------------------------------------------------------------------------------------------
return self
There is a fixed list of objectiveslists of objectives you can set:
objective [default=reg:squarederror]
reg:squarederror: regression with squared loss.
reg:squaredlogerror: regression with squared log loss 12[𝑙𝑜𝑔(𝑝𝑟𝑒𝑑+1)−𝑙𝑜𝑔(𝑙𝑎𝑏𝑒𝑙+1)]2. All input labels are required to be greater than -1. Also, see metric rmsle for possible issue with this objective.
reg:logistic: logistic regression
binary:logistic: logistic regression for binary classification, output probability
binary:logitraw: logistic regression for binary classification, output score before logistic transformation
binary:hinge: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
count:poisson –poisson regression for count data, output mean of poisson distribution
max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
survival:cox: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
multi:softprob: same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.
rank:pairwise: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized
rank:ndcg: Use LambdaMART to perform list-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximized
rank:map: Use LambdaMART to perform list-wise ranking where Mean Average Precision (MAP) is maximized
reg:gamma: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.
reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.
Just confirming that objective can't be your brie function, manually setting the objective to be your brie function inside the source code right before calling the lower level api
class XGBClassifier(XGBModel, XGBClassifierBase):
def __init__(self, objective="binary:logistic", **kwargs):
super().__init__(objective=objective, **kwargs)
def fit(self, X, y, sample_weight=None, base_margin=None,
eval_set=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, xgb_model=None,
sample_weight_eval_set=None, callbacks=None):
+-- 54 lines: evals_result = {}--------------------------------------------------------------------
xgb_options["objective"] = xgb_options["obj"]
self._Booster = train(xgb_options, train_dmatrix,
self.get_num_boosting_rounds(),
evals=evals,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, obj=obj, feval=feval,
verbose_eval=verbose, xgb_model=xgb_model,
callbacks=callbacks)
+-- 14 lines: self.objective = xgb_options["objective"]--------------------------------------------
Throws this error:
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [10:09:53] /private/var/folders/z5/mchb9bz51cx3h97nkw9v0wkr0000gn/T/pip-install-kh801rm0/xgboost/xgboost/src/objective/objective.cc:26: Unknown objective function: `<function brier at 0x10b630d08>`
Objective candidate: binary:hinge
Objective candidate: multi:softmax
Objective candidate: multi:softprob
Objective candidate: rank:pairwise
Objective candidate: rank:ndcg
Objective candidate: rank:map
Objective candidate: reg:squarederror
Objective candidate: reg:squaredlogerror
Objective candidate: reg:logistic
Objective candidate: binary:logistic
Objective candidate: binary:logitraw
Objective candidate: reg:linear
Objective candidate: count:poisson
Objective candidate: survival:cox
Objective candidate: reg:gamma
Objective candidate: reg:tweedie
Related
Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.
The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:
def loglikelihood(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
Similarly, I would need to define a custom eval metric to accompany it, such as:
def binary_error(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
return 'error', np.mean(labels != (preds > 0.5)), False
Both of the above two examples are taken from the following repository:
https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.
According to the LGBM documentation for custom loss functions:
It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:
y_true: numpy 1-D array of shape = [n_samples]
The target values.
y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.
hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)
The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.
import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def softplus(beta=1, threshold=20):
return 1 / beta* math.log(1 + math.exp(beta*x))
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
term_1 = y_true * np.log(y_pred + epsilon)
return -np.mean(term_0+term_1, axis=0)
def zero_inflated_lognormal_pred(logits):
positive_probs = sigmoid(logits[..., :1])
loc = logits[..., 1:2]
scale = softplus(logits[..., 2:])
preds = (
positive_probs *
np.exp(loc + 0.5 * np.square(scale)))
return preds
def mean_abs_pct_error(preds, train_data):
labels = train_data.get_label()
decile_labels=np.percentile(labels,np.linspace(10,100,10))
decile_preds=np.percentile(preds,np.linspace(10,100,10))
MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
return 'error', MAPE, False
def zero_inflated_lognormal_loss(train_data,
logits):
labels = train_data.get_label()
positive = labels > 0
positive_logits = logits[..., :1]
classification_loss = BinaryCrossEntropy(
y_true=positive, y_pred=positive_logits)
loc = logits[..., 1:2]
scale = math.maximum(
softplus(logits[..., 2:]),
math.sqrt(epsilon))
safe_labels = positive * labels + (
1 - positive) * np.ones(labels.shape)
regression_loss = -np.mean(
positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
axis=-1)
return classification_loss + regression_loss
assignment:Implement SGD Classifier with Logloss and L2 regularization Using SGD without using sklearn
<p><pre>Initialize the weight_vector and intercept term to zeros (Write your code in def initialize_weights())
Create a loss function (Write your code in def logloss())
logloss=−1∗1nΣforeachYt,Ypred(Ytlog10(Ypred)+(1−Yt)log10(1−Ypred))
for each epoch:
for each batch of data points in train: (keep batch size=1)
calculate the gradient of loss function w.r.t each weight in weight vector (write your code in def gradient_dw())
dw(t)=xn(yn−σ((w(t))Txn+bt))−λNw(t))
Calculate the gradient of the intercept (write your code in def gradient_db()) check this
db(t)=yn−σ((w(t))Txn+bt))
Update weights and intercept (check the equation number 32 in the above mentioned pdf):
w(t+1)←w(t)+α(dw(t))
b(t+1)←b(t)+α(db(t))
calculate the log loss for train and test with the updated weights (you can check the python assignment 10th question)
And if you wish, you can compare the previous loss and the current loss, if it is not updating, then you can stop the training
append this loss in the list ( this will be used to see how loss is changing for each epoch after the training is over )</pre></p>
importing libraries
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import linear_model
Creating custom dataset
X, y = make_classification(n_samples=50000, n_features=15, n_informative=10, n_redundant=5,
n_classes=2, weights=[0.7], class_sep=0.7, random_state=15)
Splitting data into train and test
# you need not standardize the data as it is already standardized
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=15)
def initialize_weights(row_vector):
''' In this function, we will initialize our weights and bias'''
#initialize the weights as 1d array consisting of all zeros similar to the dimensions of row_vector
#you use zeros_like function to initialize zero, check this link https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros_like.html
#initialize bias to zero
w = np.zeros_like(X_train[0])
b=0
return w,b
dim=X_train[0]
w,b = initialize_weights(dim)
print('w =',(w))
print('b =',str(b))
def sigmoid(z):
''' In this function, we will return sigmoid of z'''
# compute sigmoid(z) and return
return 1 /(1+np.exp(-z))
def logloss(y_true,y_pred):
# you have been given two arrays y_true and y_pred and you have to calculate the logloss
#while dealing with numpy arrays you can use vectorized operations for quicker calculations as compared to using loops
#https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html
#https://www.geeksforgeeks.org/vectorized-operations-in-numpy/
#write your code here
sum=0
for i in range(len(y_true)):
sum += (y_true[i] * np.log10(y_pred[i])) + ((1 - y_true[i]) * np.log10(1 - y_pred[i]))
loss = -1 * (1 / len(y_true)) * sum
return loss
#make sure that the sigmoid function returns a scalar value, you can use dot function operation
def gradient_dw(x,y,w,b,alpha,N):
'''In this function, we will compute the gardient w.r.to w '''
dw = x * (y - sigmoid(np.dot(w,x) + b) - (alpha / N) * w)
return dw
#sb should be a scalar value
def gradient_db(x,y,w,b):
'''In this function, we will compute gradient w.r.to b '''
db = y - sigmoid(np.dot(w,x)+ b)
return db
# prediction function used to compute predicted_y given the dataset X
def pred(w,b, X):
N = len(X)
predict = []
for i in range(N):
z=np.dot(w,X[i])+b
predict.append(sigmoid(z))
return np.array(predict)
def train(X_train,y_train,X_test,y_test,epochs,alpha,eta0):
''' In this function, we will implement logistic regression'''
#Here eta0 is learning rate
#implement the code as follows
# for every epoch
# for every data point(X_train,y_train)
#compute gradient w.r.to w (call the gradient_dw() function)
#compute gradient w.r.to b (call the gradient_db() function)
#update w, b
# predict the output of x_train [for all data points in X_train] using pred function with updated weights
#compute the loss between predicted and actual values (call the loss function)
# store all the train loss values in a list
# predict the output of x_test [for all data points in X_test] using pred function with updated weights
#compute the loss between predicted and actual values (call the loss function)
# store all the test loss values in a list
# you can also compare previous loss and current loss, if loss is not updating then stop the process
# you have to return w,b , train_loss and test loss
train_loss = []
test_loss = []
# initalize the weights (call the initialize_weights(X_train[0]) function)
w,b = initialize_weights(X_train[0]) # Initialize the weights
#write your code to perform SGD
# for every epoch
for i in range(epochs):
train_pred = []
test_pred = []
# for every data point(X_train,y_train)
for j in range(N):
#compute gradient w.r.to w (call the gradient_dw() function)
dw= gradient_dw(X_train[j],y_train[j],w,b,alpha,N)
#compute gradient w.r.to b (call the gradient_db() function)
db = gradient_db(X_train[j],y_train[j],w,b)
#update w, b
w= w+ (eta0* dw)
b =b+ (eta0* db)
#predict the output of x_train [for all data points in X_train] using pred function with updated weights
for k in range(0,N):
w1_predict = pred(w,b,X_train[k])
train_pred.append(w1_predict)
##compute the loss between predicted and actual values (call the loss function)
loss_pred1 = logloss(y_train,train_pred)
train_loss.append(loss_pred1)
# predict the output of x_test [for all data points in X_test] using pred function with updated weights
for k in range(len(X_test)):
w2_predict = pred(w,b,X_test[k])
test_pred.append(w2_predict)
#compute the loss between predicted and actual values (call the loss function)
loss_pred2 = logloss(y_test,test_pred)
test_loss.append(loss_pred2)
return w,b,train_loss,test_loss
alpha=0.001
eta0=0.001
N=len(X_train)
epochs=6
w,b,train_loss,test_loss=train(X_train,y_train,X_test,y_test,epochs,alpha,eta0)
from matplotlib import pyplot as plt
epoch = [i for i in range(1,epochs+1,1)]
plt.figure(figsize=(8,6))
plt.grid()
plt.plot(epoch,train_loss , label='train log loss')
plt.plot(epoch,test_loss, label='test log loss')
plt.xlabel("epoch number")
plt.ylabel("log loss")
plt.title("log loss curve of logistic regression")
plt.legend()
plt.show
I am getting this below error. I used grader function check all the function, its coming true. However. I am getting an below error while running the code. I tried use reshape to change the shape of train loss. still getting the error. Please help
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-52c352e46321> in <module>()
1 plt.figure(figsize=(8,6))
2 plt.grid()
----> 3 plt.plot(epoch,train_loss , label='train log loss')
4 plt.plot(epoch,test_loss, label='test log loss')
5 plt.xlabel("epoch number")
3 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
340
341 if x.shape[0] != y.shape[0]:
--> 342 raise ValueError(f"x and y must have same first dimension, but "
343 f"have shapes {x.shape} and {y.shape}")
344 if x.ndim > 2 or y.ndim > 2:
ValueError: x and y must have same first dimension, but have shapes (6,) and (1350,)
I am implementing multinomial logistic regression using gradient descent + L2 regularization on the MNIST dataset.
My training data is a dataframe with shape (n_samples=1198, features=65).
On each iteration of gradient descent, I take a linear combination of the weights and inputs to obtain 1198 activations (beta^T * X). I then pass these activations through a softmax function. However, I am confused about how I would obtain a probability distribution over 10 output classes for each activation?
My weights are initialized as such
n_features = 65
# init random weights
beta = np.random.uniform(0, 1, n_features).reshape(1, -1)
This is my current implementation.
def softmax(x:np.ndarray):
exps = np.exp(x)
return exps/np.sum(exps, axis=0)
def cross_entropy(y_hat:np.ndarray, y:np.ndarray, beta:np.ndarray) -> float:
"""
Computes cross entropy for multiclass classification
y_hat: predicted classes, n_samples x n_feats
y: ground truth classes, n_samples x 1
"""
n = len(y)
return - np.sum(y * np.log(y_hat) + beta**2 / n)
def gd(X:pd.DataFrame, y:pd.Series, beta:np.ndarray,
lr:float, N:int, iterations:int) -> (np.ndarray,np.ndarray):
"""
Gradient descent
"""
n = len(y)
cost_history = np.zeros(iterations)
for it in range(iterations):
activations = X.dot(beta.T).values
y_hat = softmax(activations)
cost_history[it] = cross_entropy(y_hat, y, beta)
# gradient of weights
grads = np.sum((y_hat - y) * X).values
# update weights
beta = beta - lr * (grads + 2/n * beta)
return beta, cost_history
In Multinomial Logistic Regression, you need a separate set of parameters (the pixel weights in your case) for every class. The probability of an instance belonging to a certain class is then estimated as the softmax function of the instance's score for that class. The softmax function makes sure that the estimated probabilities sum to 1 over all classes.
I have implemented a custom mean average error (MAE) loss in lightgbm. The gradient is nonzero, but the loss stays constant. How could that be?
My implementation:
def abs_obj(preds, dtrain):
y_true = dtrain.get_label()
a = preds - y_true
grad = np.sign(a)
hess = np.zeros(len(a))
return grad, hess
def abs_eval(preds, dtrain):
y_true = dtrain.get_label()
loss = np.abs(preds - y_true).sum()
return "error", loss, False
A minimal reproducible example: the loss stays constant.
dtrain = pd.DataFrame({'x':np.random.rand(100),
'y':np.random.rand(100)})
ytrain = dtrain.x + 2 * dtrain.y
dval = dtrain
yval = ytrain
lgb_train = lgb.Dataset(dtrain, ytrain)
lgb_valid = lgb.Dataset(dval, yval)
params = {'objective':None,
'learning_rate':30,
'num_leaves':33}
clf = lgb.train(params,
lgb_train,
valid_sets=[lgb_valid],
num_boost_round=10,
verbose_eval=1,
fobj=abs_obj,
feval=abs_eval)
For a custom loss in lightgbm, you need a twice differentiable function with a positive second derivative.
To speed up their algorithm, lightgbm uses Newton's approximation to find the optimal leaf value:
y = - L' / L''
(See this blogpost for details).
When the second derivative is zero or the function is not twice differentiable, this approximation is very wrong. Lightgbm has built-in objective functions which do not fit this criterion, such as MAE. For these functions they have different, special implementations.
Let $F \in \mathbb{R}^{S \times F}$ be a matrix of features, I want to classify them using logistic regression with autograd [1]. The code I am using is similar to the one in the following example [2].
The only thing I want to change is that I have an additional weight matrix $W$ in $\mathbb{R}^{F \times L}$ that I want to apply to each feature. So each feature is multiplied with $W$ and then feed into the logistic regression.
Is it somehow possible to train $W$ and the weights of the logistic regression simultaneously using autograd?
I have tried the following code, unfortunately the weights stay at value 0.
import autograd.numpy as np
from autograd import grad
global inputs
def sigmoid(x):
return 0.5 * (np.tanh(x) + 1)
def logistic_predictions(weights, inputs):
# Outputs probability of a label being true according to logistic model.
return sigmoid(np.dot(inputs, weights))
def training_loss(weights):
global inputs
# Training loss is the negative log-likelihood of the training labels.
feature_weights = weights[3:]
feature_weights = np.reshape(feature_weights, (3, 3))
inputs = np.dot(inputs, feature_weights)
preds = logistic_predictions(weights[0:3], inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))
# Build a toy dataset.
inputs = np.array([[0.52, 1.12, 0.77],
[0.88, -1.08, 0.15],
[0.52, 0.06, -1.30],
[0.74, -2.49, 1.39]])
targets = np.array([True, True, False, True])
# Define a function that returns gradients of training loss using autograd.
training_gradient_fun = grad(training_loss)
# Optimize weights using gradient descent.
weights = np.zeros([3 + 3 * 3])
print "Initial loss:", training_loss(weights)
for i in xrange(100):
print(i)
print(weights)
weights -= training_gradient_fun(weights) * 0.01
print "Trained loss:", training_loss(weights)
[1] https://github.com/HIPS/autograd
[2] https://github.com/HIPS/autograd/blob/master/examples/logistic_regression.py
Typical practice is to concatenate all "vectorized" parameters into the decision variables vector.
If you update logistic_predictions to include the W matrix, via something like
def logistic_predictions(weights_and_W, inputs):
'''
Here, :arg weights_and_W: is an array of the form [weights W.ravel()]
'''
# Outputs probability of a label being true according to logistic model.
weights = weights_and_W[:inputs.shape[1]]
W_raveled = weights_and_W[inputs.shape[1]:]
n_W = len(W_raveled)
W = W_raveled.reshape(inputs.shape[1], n_W/inputs.shape[1])
return sigmoid(np.dot(np.dot(inputs, W), weights))
then simply change traning_loss to (from the original source example)
def training_loss(weights_and_W):
# Training loss is the negative log-likelihood of the training labels.
preds = logistic_predictions(weights_and_W, inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))