I am trying to implement logistic regression with Tensorflow. I assume that I have the labels in the form of {-1, 1}. So, I have implemented the decision function and loss function
def cross_entropy(y_pred, y_true):
return tf.reduce_mean(tf.math.log(1 + tf.math.exp(- y_true * y_pred[:, 0] ))) + tf.nn.l2_loss(W)`
def logistic_regression(x):
return tf.matmul(x, W) + b
Is this correct? The loss is nan.
This is an option,
def logistic_regression(x):
# Apply softmax to normalize the logits to a probability distribution.
return tf.nn.softmax(tf.matmul(x, W) + b)
def cross_entropy(y_pred, y_true):
# Encode label to a one hot vector.
y_true = tf.one_hot(y_true, depth=num_classes)
# Clip prediction values to avoid log(0) error.
y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)
# Compute cross-entropy.
return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred)))
Take a look at this full implementation
https://builtin.com/data-science/guide-logistic-regression-tensorflow-20
Related
Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.
The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:
def loglikelihood(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
Similarly, I would need to define a custom eval metric to accompany it, such as:
def binary_error(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
return 'error', np.mean(labels != (preds > 0.5)), False
Both of the above two examples are taken from the following repository:
https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.
According to the LGBM documentation for custom loss functions:
It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:
y_true: numpy 1-D array of shape = [n_samples]
The target values.
y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.
hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)
The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.
import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def softplus(beta=1, threshold=20):
return 1 / beta* math.log(1 + math.exp(beta*x))
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
term_1 = y_true * np.log(y_pred + epsilon)
return -np.mean(term_0+term_1, axis=0)
def zero_inflated_lognormal_pred(logits):
positive_probs = sigmoid(logits[..., :1])
loc = logits[..., 1:2]
scale = softplus(logits[..., 2:])
preds = (
positive_probs *
np.exp(loc + 0.5 * np.square(scale)))
return preds
def mean_abs_pct_error(preds, train_data):
labels = train_data.get_label()
decile_labels=np.percentile(labels,np.linspace(10,100,10))
decile_preds=np.percentile(preds,np.linspace(10,100,10))
MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
return 'error', MAPE, False
def zero_inflated_lognormal_loss(train_data,
logits):
labels = train_data.get_label()
positive = labels > 0
positive_logits = logits[..., :1]
classification_loss = BinaryCrossEntropy(
y_true=positive, y_pred=positive_logits)
loc = logits[..., 1:2]
scale = math.maximum(
softplus(logits[..., 2:]),
math.sqrt(epsilon))
safe_labels = positive * labels + (
1 - positive) * np.ones(labels.shape)
regression_loss = -np.mean(
positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
axis=-1)
return classification_loss + regression_loss
I am implementing multinomial logistic regression using gradient descent + L2 regularization on the MNIST dataset.
My training data is a dataframe with shape (n_samples=1198, features=65).
On each iteration of gradient descent, I take a linear combination of the weights and inputs to obtain 1198 activations (beta^T * X). I then pass these activations through a softmax function. However, I am confused about how I would obtain a probability distribution over 10 output classes for each activation?
My weights are initialized as such
n_features = 65
# init random weights
beta = np.random.uniform(0, 1, n_features).reshape(1, -1)
This is my current implementation.
def softmax(x:np.ndarray):
exps = np.exp(x)
return exps/np.sum(exps, axis=0)
def cross_entropy(y_hat:np.ndarray, y:np.ndarray, beta:np.ndarray) -> float:
"""
Computes cross entropy for multiclass classification
y_hat: predicted classes, n_samples x n_feats
y: ground truth classes, n_samples x 1
"""
n = len(y)
return - np.sum(y * np.log(y_hat) + beta**2 / n)
def gd(X:pd.DataFrame, y:pd.Series, beta:np.ndarray,
lr:float, N:int, iterations:int) -> (np.ndarray,np.ndarray):
"""
Gradient descent
"""
n = len(y)
cost_history = np.zeros(iterations)
for it in range(iterations):
activations = X.dot(beta.T).values
y_hat = softmax(activations)
cost_history[it] = cross_entropy(y_hat, y, beta)
# gradient of weights
grads = np.sum((y_hat - y) * X).values
# update weights
beta = beta - lr * (grads + 2/n * beta)
return beta, cost_history
In Multinomial Logistic Regression, you need a separate set of parameters (the pixel weights in your case) for every class. The probability of an instance belonging to a certain class is then estimated as the softmax function of the instance's score for that class. The softmax function makes sure that the estimated probabilities sum to 1 over all classes.
Assume that y_true and y_pred are in [-1,1]. I want a weighted mean-square-error loss function, in which the loss for samples that are positive in the y_true and negative in y_pred or vice versa are weighted by exp(alpha). Here is my code:
import keras.backend as K
alpha = 1.0
def custom_loss(y_true, y_pred):
se = K.square(y_pred-y_true)
true_label = K.less_equal(y_true,0.0)
pred_label = K.less_equal(y_pred,0.0)
return K.mean(se * K.exp(alpha*K.cast(K.not_equal(true_label,pred_label), tf.float32)))
And here is a plot of this loss function. Different curves are for different values for y_true.
I want to know:
Whether this is a valid loss function, since it is not differentiable in 0?
Is my code correct?
I suggest you this type of loss function to handle imbalance dataset
def focal_loss(y_true, y_pred):
gamma = 2.0, alpha = 0.25
pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
return -K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow(pt_0, gamma) * K.log(1. - pt_0))
from this source
I wanted to use a FCN (kind of U-Net) in order to make some semantic segmentation.
I performed it using Python & Keras based on Tensorflow backend. Now I have good results, I'm trying to improve them, and I think one way to do such a thing is by improving my loss computation.
I know that in my output, the several classes are imbalanced, and using the default categorical_crossentropy function can be a problem.
My model inputs and outputs are both in the float32 format, input are channel_first and output and channel_last (permutation done at the end of the model)
In the binary case, when I only want to segment one class, I have change the loss function in this way so it can add the weights case by case depending on the content of the output :
def weighted_loss(y_true, y_pred):
def weighted_binary_cross_entropy(y_true, y_pred):
w = tf.reduce_sum(y_true)/tf_cast(tf_size(y_true), tf_float32)
real_th = 0.5-th
tf_th = tf.fill(tf.shape(y_pred), real_th)
tf_zeros = tf.fill(tf.shape(y_pred), 0.)
return (1.0 - w) * y_true * - tf.log(tf.maximum(tf.zeros, tf.sigmoid(y_pred) + tf_th)) +
(1- y_true) * w * -tf.log(1 - tf.maximum(tf_zeros, tf.sigmoid(y_pred) + tf_th))
return weighted_binary_coss_entropy
Note that th is the activation threshold which by default is 1/nClasses and which I have changed in order to see what value gives me the best results
What do you think about it?
What about change it so it will be able to compute the weighted categorical cross entropy (in the case of multi-class)
Your implementation will work for binary classes , for multi class it will just be
-y_true * tf.log(tf.sigmoid(y_pred))
and use inbuilt tensorflow method for calculating categorical entropy as it avoids overflow for y_pred<0
you can view this answer Unbalanced data and weighted cross entropy ,it explains weighted categorical cross entropy implementation.
The only change for categorical_crossentropy would be
def weighted_loss(y_true, y_pred):
def weighted_categorical_cross_entropy(y_true, y_pred):
w = tf.reduce_sum(y_true)/tf_cast(tf_size(y_true), tf_float32)
loss = w * tf.nn.softmax_cross_entropy_with_logits(onehot_labels, logits)
return loss
return weighted_categorical_cross_entropy
extracting prediction for individual class
def loss(y_true, y_pred):
s = tf.shape(y_true)
# if number of output classes is at last
number_classses = s[-1]
# this will give you one hot code for your prediction
clf_pred = tf.one_hot(tf.argmax(y_pred, axis=-1), depth=number_classses, axis=-1)
# extract the values of y_pred where y_pred is max among the classes
prediction = tf.where(tf.equal(clf_pred, 1), y_pred, tf.zeros_like(y_pred))
# if one hotcode == 1 then class1_prediction == y_pred else class1_prediction ==0
class1_prediction = prediction[:, :, :, 0:1]
# you can compute your loss here on individual class and return the loss ,just for simplicity i am returning the class1_prediction
return class1_prediction
output from model
y_pred = [[[[0.5, 0.3, 0.7],
[0.6, 0.3, 0.2]]
,
[[0.7, 0.9, 0.6],
[0.3 ,0.9, 0.3]]]]
corresponding ground truth
y_true = [[[[0, 1, 0],
[1 ,0, 0]]
,
[[1,0 , 0],
[0,1, 0]]]]
prediction for class 1
prediction = loss(y_true, y_pred)
# prediction = [[[[0. ],[0.6]],[0. ],[0. ]]]]
Let $F \in \mathbb{R}^{S \times F}$ be a matrix of features, I want to classify them using logistic regression with autograd [1]. The code I am using is similar to the one in the following example [2].
The only thing I want to change is that I have an additional weight matrix $W$ in $\mathbb{R}^{F \times L}$ that I want to apply to each feature. So each feature is multiplied with $W$ and then feed into the logistic regression.
Is it somehow possible to train $W$ and the weights of the logistic regression simultaneously using autograd?
I have tried the following code, unfortunately the weights stay at value 0.
import autograd.numpy as np
from autograd import grad
global inputs
def sigmoid(x):
return 0.5 * (np.tanh(x) + 1)
def logistic_predictions(weights, inputs):
# Outputs probability of a label being true according to logistic model.
return sigmoid(np.dot(inputs, weights))
def training_loss(weights):
global inputs
# Training loss is the negative log-likelihood of the training labels.
feature_weights = weights[3:]
feature_weights = np.reshape(feature_weights, (3, 3))
inputs = np.dot(inputs, feature_weights)
preds = logistic_predictions(weights[0:3], inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))
# Build a toy dataset.
inputs = np.array([[0.52, 1.12, 0.77],
[0.88, -1.08, 0.15],
[0.52, 0.06, -1.30],
[0.74, -2.49, 1.39]])
targets = np.array([True, True, False, True])
# Define a function that returns gradients of training loss using autograd.
training_gradient_fun = grad(training_loss)
# Optimize weights using gradient descent.
weights = np.zeros([3 + 3 * 3])
print "Initial loss:", training_loss(weights)
for i in xrange(100):
print(i)
print(weights)
weights -= training_gradient_fun(weights) * 0.01
print "Trained loss:", training_loss(weights)
[1] https://github.com/HIPS/autograd
[2] https://github.com/HIPS/autograd/blob/master/examples/logistic_regression.py
Typical practice is to concatenate all "vectorized" parameters into the decision variables vector.
If you update logistic_predictions to include the W matrix, via something like
def logistic_predictions(weights_and_W, inputs):
'''
Here, :arg weights_and_W: is an array of the form [weights W.ravel()]
'''
# Outputs probability of a label being true according to logistic model.
weights = weights_and_W[:inputs.shape[1]]
W_raveled = weights_and_W[inputs.shape[1]:]
n_W = len(W_raveled)
W = W_raveled.reshape(inputs.shape[1], n_W/inputs.shape[1])
return sigmoid(np.dot(np.dot(inputs, W), weights))
then simply change traning_loss to (from the original source example)
def training_loss(weights_and_W):
# Training loss is the negative log-likelihood of the training labels.
preds = logistic_predictions(weights_and_W, inputs)
label_probabilities = preds * targets + (1 - preds) * (1 - targets)
return -np.sum(np.log(label_probabilities))