Tensorflow 2.x custom loss function on Google Colaboratory - python

Problem
I am trying to write a custom loss function for my Tensorflow 2 model. I have written the following function that calculates the loss I am seeking when I manually pass in an input and output Tensor.
def on_off_balance_loss(y_true: EagerTensor, y_pred: EagerTensor) -> float:
y_true_array: ndarray = np.asarray(y_true).flatten()
y_predict_array: ndarray = np.asarray(y_pred).flatten()
on_delta: float = 0.999
on_loss: float = 0
off_loss: float = 0
on_count: int = 0
off_count: int = 0
for i in range(len(y_true_array)):
loss: float = cell_loss(y_true_array[i], y_predict_array[i])
if y_true_array[i] > on_delta:
on_count += 1
on_loss = on_loss * ((on_count - 1) / on_count) + (loss / on_count)
else:
off_count += 1
off_loss = off_loss * ((off_count - 1) / off_count) + (loss / off_count)
on_factor: int = 4
return (on_factor * on_loss + off_loss) / (on_factor + 1)
For context, y_true consists of a 2D matrix of 1's and 0's as floats, where 0's are much more common. As such, my model was getting a good loss value by just getting most of the 0's correct, even though where the 1's are is the more important metric. This custom loss puts more proportional emphasis on the location of the 1's.
I changed model.compile(loss="binary_crossentropy") to model.compile(loss=on_off_balance_loss) in the attempt to use the new loss function. This doesn't seem to work, as the loss function is supposed to take in an entire batch of data. So, I tried something like this with model.compile(loss=on_off_balance_batch_loss):
def on_off_balance_batch_loss(y_true, y_pred) -> float:
y_trues: list = tf.unstack(y_true)
y_preds: list = tf.unstack(y_pred)
loss: float = 0
for i in range(0, len(y_trues)):
loss = loss * (i / (i + 1)) + (on_off_balance_loss(y_trues[i], y_preds[i]) / (i + 1))
return loss
This doesn't work. The shape of y_true is (None, None, None), and the shape of y_pred is (None, X, Y), where X and Y are the dimensions of the 2D array of 1's and 0's.
I am working in Google Colaboratory. However, locally, np.asarray() seems to work in the way that throws an error on Colaboratory. So, I'm not really sure if the error lies in my loss function or with some setup thing in Colaboratory. I have ensured that I am using Tensorflow 2.3.0 both locally and on Colaboratory.
EDITS:
I tried adding run_eagerly=True to model.compile() and using .numpy() instead of np.asarray() in on_off_balance_loss(). This changed the type of input in on_off_balance_batch_loss from Tensor to EagerTensor. This leads to the error ValueError: No gradients provided for any variable: ['lstm_3/lstm_cell_3/kernel:0', 'lstm_3/lstm_cell_3/recurrent_kernel:0', 'lstm_3/lstm_cell_3/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0', 'lstm_4/lstm_cell_4/kernel:0', 'lstm_4/lstm_cell_4/recurrent_kernel:0', 'lstm_4/lstm_cell_4/bias:0', 'dense_3/kernel:0', 'dense_3/bias:0', 'lstm_5/lstm_cell_5/kernel:0', 'lstm_5/lstm_cell_5/recurrent_kernel:0', 'lstm_5/lstm_cell_5/bias:0'].. The same error occurs if I use
def on_off_balance_batch_loss(y_true: EagerTensor, y_pred: EagerTensor) -> float:
y_trues = tf.TensorArray(tf.float32, 1, dynamic_size=True, infer_shape=False).unstack(y_true)
y_preds = tf.TensorArray(tf.float32, 1, dynamic_size=True, infer_shape=False).unstack(y_pred)
loss: float = 0.0
i: int = 0
for tensor in range(y_trues.size()):
elem_loss: float = on_off_balance_loss(y_trues.read(i), y_preds.read(i))
loss = loss * (i / (i + 1)) + (elem_loss / (i + 1))
i += 1
return loss
and omit run_eagerly=True. Even before the errors are reached, it seems that the whole program is running slower that when I used a default loss function.

Well, turns out the solution is a lot more simple than what I was trying above. Below is the style in which such a function should be implemented. I compared it to the output of my original function, on_off_balance_loss, and it matches.
def on_off_equal_loss(y_true: Tensor, y_pred: Tensor) -> Tensor:
on_delta: float = 0.99
on_mask: Tensor = tf.greater_equal(y_true, on_delta)
off_mask: Tensor = tf.less(y_true, on_delta)
on_loss: Tensor = tf.divide(tf.reduce_sum(tf.abs(tf.subtract(
y_true[on_mask], y_pred[on_mask]
))), tf.cast(tf.math.count_nonzero(on_mask), tf.float32))
off_loss: Tensor = tf.divide(tf.reduce_sum(tf.abs(tf.subtract(
y_true[off_mask], y_pred[off_mask]
))), tf.cast(tf.math.count_nonzero(off_mask), tf.float32))
on_factor: float = 4.0
return tf.divide(tf.add(tf.multiply(on_factor, on_loss), off_loss), on_factor + 1.0)

Related

Need help implementing a custom loss function in lightGBM (Zero-inflated Log Normal Loss)

Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.
The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:
def loglikelihood(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
Similarly, I would need to define a custom eval metric to accompany it, such as:
def binary_error(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
return 'error', np.mean(labels != (preds > 0.5)), False
Both of the above two examples are taken from the following repository:
https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.
According to the LGBM documentation for custom loss functions:
It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:
y_true: numpy 1-D array of shape = [n_samples]
The target values.
y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.
hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)
The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.
import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def softplus(beta=1, threshold=20):
return 1 / beta* math.log(1 + math.exp(beta*x))
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
term_1 = y_true * np.log(y_pred + epsilon)
return -np.mean(term_0+term_1, axis=0)
def zero_inflated_lognormal_pred(logits):
positive_probs = sigmoid(logits[..., :1])
loc = logits[..., 1:2]
scale = softplus(logits[..., 2:])
preds = (
positive_probs *
np.exp(loc + 0.5 * np.square(scale)))
return preds
def mean_abs_pct_error(preds, train_data):
labels = train_data.get_label()
decile_labels=np.percentile(labels,np.linspace(10,100,10))
decile_preds=np.percentile(preds,np.linspace(10,100,10))
MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
return 'error', MAPE, False
def zero_inflated_lognormal_loss(train_data,
logits):
labels = train_data.get_label()
positive = labels > 0
positive_logits = logits[..., :1]
classification_loss = BinaryCrossEntropy(
y_true=positive, y_pred=positive_logits)
loc = logits[..., 1:2]
scale = math.maximum(
softplus(logits[..., 2:]),
math.sqrt(epsilon))
safe_labels = positive * labels + (
1 - positive) * np.ones(labels.shape)
regression_loss = -np.mean(
positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
axis=-1)
return classification_loss + regression_loss

Tensorflow implementation of NT_Xent contrastive loss function?

As the title suggests, I'm trying train a model based on the SimCLR framework (seen in this paper: https://arxiv.org/pdf/2002.05709.pdf - the NT_Xent loss is stated in equation (1) and Algorithm 1).
I have managed to create a numpy version of the loss function, but this is not suitable to train the model on, as numpy arrays cannot store the required information for back propagation. I am having difficulty converting my numpy code over to Tensorflow. Here is my numpy version:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
I am fairly confident that this function produces the correct results (albeit slowly, as I have seen other implementations of it online that were vectorised versions - such as this one for Pytorch: https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py (my code produces the same result for identical inputs), but I do not see how their version is mathematically equivalent to the formula in the paper, hence why I am trying to build my own).
As a first try I have converted the numpy functions to their TF equivalents (tf.concat, tf.reshape, tf.math.exp, tf.range, etc.), but I believe my only/main problem is that sklearn's cosine_similarity function returns a numpy array, and I do not know how to build this function myself in Tensorflow. Any ideas?
I managed to figure it out myself!
I did not realise there was a Tensorflow implementation of the cosine similarity function "tf.keras.losses.CosineSimilarity"
Here is my code:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
As you can see, I have essentially just swapped out the numpy functions for the TF equivalents. One main point of note is that I had to use "reduction=tf.keras.losses.Reduction.NONE" within the "cosine_sim" function, this was to keep the shapes consistent in the "sim_ik" and "sim_jk", because otherwise the resulting loss did not match up with my original numpy implementation.
I also noticed that individually calculating the numerator for i,j and j,i was redundant as the answers were the same, so I have removed one instance of that calculation.
Of course if anybody has a quicker implementation I am more than happy to hear about it!
Here is a more efficient and more stable implementation. Assuming zi and zj are interlaced!
class NT_Xent(tf.keras.layers.Layer):
""" Normalized temperature-scaled CrossEntropy loss [1]
[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” arXiv. 2020, Accessed: Jan. 15, 2021. [Online]. Available: https://github.com/google-research/simclr.
"""
def __init__(self, tau=1, **kwargs):
super().__init__(**kwargs)
self.tau = tau
self.similarity = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
self.criterion = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
def get_config(self):
return {"tau": self.tau}
def call(self, zizj):
""" zizj is [B,N] tensor with order z_i1 z_j1 z_i2 z_j2 z_i3 z_j3 ...
batch_size is twice the original batch_size
"""
batch_size = tf.shape(zizj)[0]
mask = tf.repeat(tf.repeat(~tf.eye(batch_size/2, dtype=tf.bool), 2, axis=0), 2, axis=1)
sim = -1*self.similarity(tf.expand_dims(zizj, 1), tf.expand_dims(zizj, 0))/self.tau
sim_i_j = -1*self.similarity(zizj[0::2], zizj[1::2])/self.tau
pos = tf.reshape(tf.repeat(sim_i_j, repeats=2), (batch_size, -1))
neg = tf.reshape(sim[mask], (batch_size, -1))
logits = tf.concat((pos, neg), axis=-1)
labels = tf.one_hot(tf.zeros((batch_size,), dtype=tf.int32), depth=batch_size-1)
return self.criterion(labels, logits)
source: https://github.com/gabriel-vanzandycke/tf_layers

Do all variables in the loss function have to be tensor with grads in pytorch?

I have the following function
def msfe(ys, ts):
ys=ys.detach().numpy() #output from the network
ts=ts.detach().numpy() #Target (true labels)
pred_class = (ys>=0.5)
n_0 = sum(ts==0) #Number of true negatives
n_1 = sum(ts==1) #Number of true positives
FPE = sum((ts==0)[[bool(p) for p in (pred_class==1)]])/n_0 #False positive error
FNE = sum((ts==1)[[bool(p) for p in (pred_class==0)]])/n_1 #False negative error
loss= FPE**2+FNE**2
loss=torch.tensor(loss,dtype=torch.float64,requires_grad=True)
return loss
and I wonder, if the autograd in Pytorch works properly, since ys and ts does not have the grad flag.
So my question is: do all the variables (FPE,FNE,ys,ts,n_1,n_0) have to be tensors, before optimizer.step() works, or is it okay that it is only the final function (loss) which is ?
All of the variables you want to optimise via optimizer.step() need to have gradient.
In your case it would be y predicted by network, so you shouldn't detach it (from graph).
Usually you don't change your targets, so those don't need gradients. You shouldn't have to detach them though, tensors by default don't require gradient and won't be backpropagated.
Loss will have gradient if it's ingredients (at least one) have gradient.
Overall you rarely need to take care of it manually.
BTW. don't use numpy with PyTorch, there is rarely ever the case to do so. You can perform most of the operations you can do on numpy array on PyTorch's tensor.
BTW2. There is no such thing as Variable in pytorch anymore, only tensors which require gradient and those that don't.
Non-differentiability
1.1 Problems with existing code
Indeed, you are using functions which are not differentiable (namely >= and ==). Those will give you trouble only in the case of your outputs, as those required gradient (you can use == and >= for targets though).
Below I have attached your loss function and outlined problems in it in the comments:
# Gradient can't propagate if you detach and work in another framework
# Most Python constructs should be fine, detaching will ruin it though.
def msfe(outputs, targets):
# outputs=outputs.detach().numpy() # Do not detach, no need to do that
# targets=targets.detach().numpy() # No need for numpy either
pred_class = outputs >= 0.5 # This one is non-differentiable
# n_0 = sum(targets==0) # Do not use sum, there is pytorch function for that
# n_1 = sum(targets==1)
n_0 = torch.sum(targets == 0) # Those are not differentiable, but...
n_1 = torch.sum(targets == 1) # It does not matter as those are targets
# FPE = sum((targets==0)[[bool(p) for p in (pred_class==1)]])/n_0 # Do not use Python bools
# FNE = sum((targets==1)[[bool(p) for p in (pred_class==0)]])/n_1 # Stay within PyTorch
# Those two below are non-differentiable due to == sign as well
FPE = torch.sum((targets == 0.0) * (pred_class == 1.0)).float() / n_0
FNE = torch.sum((targets == 1.0) * (pred_class == 0.0)).float() / n_1
# This is obviously fine
loss = FPE ** 2 + FNE ** 2
# Loss should be a tensor already, don't do things like that
# Gradient will not be propagated, you will have a new tensor
# Always returning gradient of `1` and that's all
# loss = torch.tensor(loss, dtype=torch.float64, requires_grad=True)
return loss
1.2 Possible solution
So, you need to get rid of 3 non-differentiable parts. You could in principle try to approximate it with continuous outputs from your network (provided you are using sigmoid as activation). Here is my take:
def msfe_approximation(outputs, targets):
n_0 = torch.sum(targets == 0) # Gradient does not flow through it, it's okay
n_1 = torch.sum(targets == 1) # Same as above
FPE = torch.sum((targets == 0) * outputs).float() / n_0
FNE = torch.sum((targets == 1) * (1 - outputs)).float() / n_1
return FPE ** 2 + FNE ** 2
Notice that to minimize FPE outputs will try to be zero on the indices where targets are zero. Similarly for FNE, if targets are 1, network will try to output 1 as well.
Notice similarity of this idea to BCELoss (Binary CrossEntropy).
And lastly, example you can run this on, just for sanity check:
if __name__ == "__main__":
model = torch.nn.Sequential(
torch.nn.Linear(30, 100),
torch.nn.ReLU(),
torch.nn.Linear(100, 200),
torch.nn.ReLU(),
torch.nn.Linear(200, 1),
torch.nn.Sigmoid(),
)
optimizer = torch.optim.Adam(model.parameters())
targets = torch.randint(high=2, size=(64, 1)) # random targets
inputs = torch.rand(64, 30) # random data
for _ in range(1000):
optimizer.zero_grad()
outputs = model(inputs)
loss = msfe_approximation(outputs, targets)
print(loss)
loss.backward()
optimizer.step()
print(((model(inputs) >= 0.5) == targets).float().mean())

Extracting first dimension of a tensor without using get_shape, size and shape functions?

I wrote a loss function in Keras. It has two parameters, y_true and y_pred. My first line of code was: batch = y_pred.get_shape()[0]. Then in my batch variable I have first dimension of y_pred, so then I looped over range(batch) and wrote what I wrote. That doesn't matter. The matter is that when I compile everything, I got an error message that tells me that batch is not an integer, but a tensor. Then, as a beginner in Tensorflow, I started thinking how to get an integer from batch, which should be an integer, but a tensor. I tried to do sess.run(batch) but that didn't help at all. So, my problem is how to get an integer from a tensor that represents an integer variable. I would like to use some function which really gives me an integer, not tensor. Please help. Here is my code:
def custom_loss(y_true, y_pred):
batch = y_pred.get_shape()[0]
list_ones = returnListOnes(batch)
tensor_ones = tf.convert_to_tensor(list_ones)
loss = 0
for i in range(batch):
for j in range(S):
for k in range(S):
lista = returnListOnesIndex(batch, [j,k,0])
lista_bx = returnListOnesIndex(batch, [j,k,1])
lista_by = returnListOnesIndex(batch, [j,k,2])
lista_bw = returnListOnesIndex(batch, [j,k,3])
lista_bh = returnListOnesIndex(batch, [j,k,4])
lista_to_tensor = tf.convert_to_tensor(lista)
lista_bx_to_tensor = tf.convert_to_tensor(lista_bx)
lista_by_to_tensor = tf.convert_to_tensor(lista_by)
lista_bw_to_tensor = tf.convert_to_tensor(lista_bw)
lista_bh_to_tensor = tf.convert_to_tensor(lista_bh)
element = tf.reduce_sum(tf.multiply(lista_to_tensor,y_pred))
element_true = tf.reduce_sum(tf.multiply(lista_to_tensor, y_true))
element_bx = tf.reduce_sum(tf.multiply(lista_bx_to_tensor, y_pred))
element_bx_true = tf.reduce_sum(tf.multiply(lista_bx_to_tensor, y_true))
element_by = tf.reduce_sum(tf.multiply(lista_by_to_tensor, y_pred))
element_by_true = tf.reduce_sum(tf.multiply(lista_by_to_tensor, y_true))
element_bw = tf.reduce_sum(tf.multiply(lista_bw_to_tensor, y_pred))
element_bw_true = tf.reduce_sum(tf.multiply(lista_bw_to_tensor, y_true))
element_bh = tf.reduce_sum(tf.multiply(lista_bh_to_tensor, y_pred))
element_bh_true = tf.reduce_sum(tf.multiply(lista_bh_to_tensor, y_true))
distance = tf.square(tf.subtract(element, element_true))
distance_bx = tf.square(tf.subtract(element_bx, element_bx_true))
distance_by = tf.square(tf.subtract(element_by, element_by_true))
distance_bw = tf.square(tf.subtract(element_bw, element_bw_true))
distance_bh = tf.square(tf.subtract(element_bh, element_bh_true))
suma = tf.add(distance, distance_bx)
suma = tf.add(suma, distance_by)
suma = tf.add(suma, distance_bw)
suma = tf.add(suma, distance_bh)
loss += tf.cond(tf.greater(element_true,0.),
lambda: suma,
lambda: distance)
return loss
As you see, I want batch variable to be int so that I could loop and do something. I also used size and shape and it wouldn't work also.
Vectorized code will definitely be more efficient, and I would strongly encourage you to try to write the code in a manner which does not require looping.
However, if you are unable to do so, you can resort to tf.map_fn.
From your code I cannot see where i is used inside your loop. I'm guessing this is a bug (batch should be i inside the loop maybe) or my own blindness - otherwise you can just multiply the result by the batch size...

Keras custom loss function for YOLO

I am trying to define a custom loss function in Keras
def yolo_loss(y_true, y_pred):
Here the shape of y_true and y_pred are [batch_size,19,19,5].
for each image in the batch, I want to compute the loss as:
loss =
square(y_true[:,:,0] - y_pred[:,:,0])
+ square(y_true[:,:,1] - y_pred[:,:,1])
+ square(y_true[:,:,2] - y_pred[:,:,2])
+ (sqrt(y_true[:,:,3]) - sqrt(y_pred[:,:,3]))
+ (sqrt(y_true[:,:,4]) - sqrt(y_pred[:,:,4]))
I thought of a couple of ways of doing this,
1) using a for loop:
def yolo_loss(y_true, y_pred):
y_ret = tf.zeros([1,y_true.shape[0]])
for i in range(0,int(y_true.shape[0])):
op1 = y_true[i,:,:,:]
op2 = y_pred[i,:,:,:]
class_error = tf.reduce_sum(tf.multiply((op1[:,:,0]-op2[:,:,0]),(op1[:,:,0]-op2[:,:,0])))
row_error = tf.reduce_sum(tf.multiply((op1[:,:,1]-op2[:,:,1]),(op1[:,:,1]-op2[:,:,1])))
col_error = tf.reduce_sum(tf.multiply((op1[:,:,2]-op2[:,:,2]),(op1[:,:,2]-op2[:,:,2])))
h_error = tf.reduce_sum(tf.abs(tf.sqrt(op1[:,:,3])-tf.sqrt(op2[:,:,3])))
w_error = tf.reduce_sum(tf.abs(tf.sqrt(op1[:,:,4])-tf.sqrt(op2[:,:,4])))
total_error = class_error + row_error + col_error + h_error + w_error
y_ret[0,i] = total_error
return y_ret
This however gives me an error:
ValueError: Cannot convert a partially known TensorShape to a Tensor:
(1, ?)
This is because I guess the batch size is undefined.
2) Another way is to apply the sqrt transformations to each of the image tensors in the batch and then subtract them and then apply the square transform.
for e.g
1) sqrt(y_true[:,:,:,3])
2) sqrt(y_pred[:,:,:,3])
3) sqrt(y_true[:,:,:,4])
4) sqrt(y_pred[:,:,:,4])
5) y_new = y_true-y_pred
6) square(y_new[:,:,:,0])
7) square(y_new[:,:,:,1])
8) square(y_new[:,:,:,2])
9) reduce_sum for each new tensor in the batch and return o/p in shape [1,batch_size]
However I could not find a way to do this in Keras.
Can someone suggest, what would be the best way to implement this loss function. I am using Keras with tensorflow at the backend.
You can have a look in to this git hub page.
https://github.com/experiencor/keras-yolo2

Categories

Resources