I am trying to define a custom loss function in Keras
def yolo_loss(y_true, y_pred):
Here the shape of y_true and y_pred are [batch_size,19,19,5].
for each image in the batch, I want to compute the loss as:
loss =
square(y_true[:,:,0] - y_pred[:,:,0])
+ square(y_true[:,:,1] - y_pred[:,:,1])
+ square(y_true[:,:,2] - y_pred[:,:,2])
+ (sqrt(y_true[:,:,3]) - sqrt(y_pred[:,:,3]))
+ (sqrt(y_true[:,:,4]) - sqrt(y_pred[:,:,4]))
I thought of a couple of ways of doing this,
1) using a for loop:
def yolo_loss(y_true, y_pred):
y_ret = tf.zeros([1,y_true.shape[0]])
for i in range(0,int(y_true.shape[0])):
op1 = y_true[i,:,:,:]
op2 = y_pred[i,:,:,:]
class_error = tf.reduce_sum(tf.multiply((op1[:,:,0]-op2[:,:,0]),(op1[:,:,0]-op2[:,:,0])))
row_error = tf.reduce_sum(tf.multiply((op1[:,:,1]-op2[:,:,1]),(op1[:,:,1]-op2[:,:,1])))
col_error = tf.reduce_sum(tf.multiply((op1[:,:,2]-op2[:,:,2]),(op1[:,:,2]-op2[:,:,2])))
h_error = tf.reduce_sum(tf.abs(tf.sqrt(op1[:,:,3])-tf.sqrt(op2[:,:,3])))
w_error = tf.reduce_sum(tf.abs(tf.sqrt(op1[:,:,4])-tf.sqrt(op2[:,:,4])))
total_error = class_error + row_error + col_error + h_error + w_error
y_ret[0,i] = total_error
return y_ret
This however gives me an error:
ValueError: Cannot convert a partially known TensorShape to a Tensor:
(1, ?)
This is because I guess the batch size is undefined.
2) Another way is to apply the sqrt transformations to each of the image tensors in the batch and then subtract them and then apply the square transform.
for e.g
1) sqrt(y_true[:,:,:,3])
2) sqrt(y_pred[:,:,:,3])
3) sqrt(y_true[:,:,:,4])
4) sqrt(y_pred[:,:,:,4])
5) y_new = y_true-y_pred
6) square(y_new[:,:,:,0])
7) square(y_new[:,:,:,1])
8) square(y_new[:,:,:,2])
9) reduce_sum for each new tensor in the batch and return o/p in shape [1,batch_size]
However I could not find a way to do this in Keras.
Can someone suggest, what would be the best way to implement this loss function. I am using Keras with tensorflow at the backend.
You can have a look in to this git hub page.
https://github.com/experiencor/keras-yolo2
Related
Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.
The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:
def loglikelihood(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
grad = preds - labels
hess = preds * (1. - preds)
return grad, hess
Similarly, I would need to define a custom eval metric to accompany it, such as:
def binary_error(preds, train_data):
labels = train_data.get_label()
preds = 1. / (1. + np.exp(-preds))
return 'error', np.mean(labels != (preds > 0.5)), False
Both of the above two examples are taken from the following repository:
https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.
According to the LGBM documentation for custom loss functions:
It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:
y_true: numpy 1-D array of shape = [n_samples]
The target values.
y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.
hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)
The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.
import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
return 1 / (1 + math.exp(-x))
def softplus(beta=1, threshold=20):
return 1 / beta* math.log(1 + math.exp(beta*x))
def BinaryCrossEntropy(y_true, y_pred):
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
term_1 = y_true * np.log(y_pred + epsilon)
return -np.mean(term_0+term_1, axis=0)
def zero_inflated_lognormal_pred(logits):
positive_probs = sigmoid(logits[..., :1])
loc = logits[..., 1:2]
scale = softplus(logits[..., 2:])
preds = (
positive_probs *
np.exp(loc + 0.5 * np.square(scale)))
return preds
def mean_abs_pct_error(preds, train_data):
labels = train_data.get_label()
decile_labels=np.percentile(labels,np.linspace(10,100,10))
decile_preds=np.percentile(preds,np.linspace(10,100,10))
MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
return 'error', MAPE, False
def zero_inflated_lognormal_loss(train_data,
logits):
labels = train_data.get_label()
positive = labels > 0
positive_logits = logits[..., :1]
classification_loss = BinaryCrossEntropy(
y_true=positive, y_pred=positive_logits)
loc = logits[..., 1:2]
scale = math.maximum(
softplus(logits[..., 2:]),
math.sqrt(epsilon))
safe_labels = positive * labels + (
1 - positive) * np.ones(labels.shape)
regression_loss = -np.mean(
positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
axis=-1)
return classification_loss + regression_loss
Problem
I am trying to write a custom loss function for my Tensorflow 2 model. I have written the following function that calculates the loss I am seeking when I manually pass in an input and output Tensor.
def on_off_balance_loss(y_true: EagerTensor, y_pred: EagerTensor) -> float:
y_true_array: ndarray = np.asarray(y_true).flatten()
y_predict_array: ndarray = np.asarray(y_pred).flatten()
on_delta: float = 0.999
on_loss: float = 0
off_loss: float = 0
on_count: int = 0
off_count: int = 0
for i in range(len(y_true_array)):
loss: float = cell_loss(y_true_array[i], y_predict_array[i])
if y_true_array[i] > on_delta:
on_count += 1
on_loss = on_loss * ((on_count - 1) / on_count) + (loss / on_count)
else:
off_count += 1
off_loss = off_loss * ((off_count - 1) / off_count) + (loss / off_count)
on_factor: int = 4
return (on_factor * on_loss + off_loss) / (on_factor + 1)
For context, y_true consists of a 2D matrix of 1's and 0's as floats, where 0's are much more common. As such, my model was getting a good loss value by just getting most of the 0's correct, even though where the 1's are is the more important metric. This custom loss puts more proportional emphasis on the location of the 1's.
I changed model.compile(loss="binary_crossentropy") to model.compile(loss=on_off_balance_loss) in the attempt to use the new loss function. This doesn't seem to work, as the loss function is supposed to take in an entire batch of data. So, I tried something like this with model.compile(loss=on_off_balance_batch_loss):
def on_off_balance_batch_loss(y_true, y_pred) -> float:
y_trues: list = tf.unstack(y_true)
y_preds: list = tf.unstack(y_pred)
loss: float = 0
for i in range(0, len(y_trues)):
loss = loss * (i / (i + 1)) + (on_off_balance_loss(y_trues[i], y_preds[i]) / (i + 1))
return loss
This doesn't work. The shape of y_true is (None, None, None), and the shape of y_pred is (None, X, Y), where X and Y are the dimensions of the 2D array of 1's and 0's.
I am working in Google Colaboratory. However, locally, np.asarray() seems to work in the way that throws an error on Colaboratory. So, I'm not really sure if the error lies in my loss function or with some setup thing in Colaboratory. I have ensured that I am using Tensorflow 2.3.0 both locally and on Colaboratory.
EDITS:
I tried adding run_eagerly=True to model.compile() and using .numpy() instead of np.asarray() in on_off_balance_loss(). This changed the type of input in on_off_balance_batch_loss from Tensor to EagerTensor. This leads to the error ValueError: No gradients provided for any variable: ['lstm_3/lstm_cell_3/kernel:0', 'lstm_3/lstm_cell_3/recurrent_kernel:0', 'lstm_3/lstm_cell_3/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0', 'lstm_4/lstm_cell_4/kernel:0', 'lstm_4/lstm_cell_4/recurrent_kernel:0', 'lstm_4/lstm_cell_4/bias:0', 'dense_3/kernel:0', 'dense_3/bias:0', 'lstm_5/lstm_cell_5/kernel:0', 'lstm_5/lstm_cell_5/recurrent_kernel:0', 'lstm_5/lstm_cell_5/bias:0'].. The same error occurs if I use
def on_off_balance_batch_loss(y_true: EagerTensor, y_pred: EagerTensor) -> float:
y_trues = tf.TensorArray(tf.float32, 1, dynamic_size=True, infer_shape=False).unstack(y_true)
y_preds = tf.TensorArray(tf.float32, 1, dynamic_size=True, infer_shape=False).unstack(y_pred)
loss: float = 0.0
i: int = 0
for tensor in range(y_trues.size()):
elem_loss: float = on_off_balance_loss(y_trues.read(i), y_preds.read(i))
loss = loss * (i / (i + 1)) + (elem_loss / (i + 1))
i += 1
return loss
and omit run_eagerly=True. Even before the errors are reached, it seems that the whole program is running slower that when I used a default loss function.
Well, turns out the solution is a lot more simple than what I was trying above. Below is the style in which such a function should be implemented. I compared it to the output of my original function, on_off_balance_loss, and it matches.
def on_off_equal_loss(y_true: Tensor, y_pred: Tensor) -> Tensor:
on_delta: float = 0.99
on_mask: Tensor = tf.greater_equal(y_true, on_delta)
off_mask: Tensor = tf.less(y_true, on_delta)
on_loss: Tensor = tf.divide(tf.reduce_sum(tf.abs(tf.subtract(
y_true[on_mask], y_pred[on_mask]
))), tf.cast(tf.math.count_nonzero(on_mask), tf.float32))
off_loss: Tensor = tf.divide(tf.reduce_sum(tf.abs(tf.subtract(
y_true[off_mask], y_pred[off_mask]
))), tf.cast(tf.math.count_nonzero(off_mask), tf.float32))
on_factor: float = 4.0
return tf.divide(tf.add(tf.multiply(on_factor, on_loss), off_loss), on_factor + 1.0)
As the title suggests, I'm trying train a model based on the SimCLR framework (seen in this paper: https://arxiv.org/pdf/2002.05709.pdf - the NT_Xent loss is stated in equation (1) and Algorithm 1).
I have managed to create a numpy version of the loss function, but this is not suitable to train the model on, as numpy arrays cannot store the required information for back propagation. I am having difficulty converting my numpy code over to Tensorflow. Here is my numpy version:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
I am fairly confident that this function produces the correct results (albeit slowly, as I have seen other implementations of it online that were vectorised versions - such as this one for Pytorch: https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py (my code produces the same result for identical inputs), but I do not see how their version is mathematically equivalent to the formula in the paper, hence why I am trying to build my own).
As a first try I have converted the numpy functions to their TF equivalents (tf.concat, tf.reshape, tf.math.exp, tf.range, etc.), but I believe my only/main problem is that sklearn's cosine_similarity function returns a numpy array, and I do not know how to build this function myself in Tensorflow. Any ideas?
I managed to figure it out myself!
I did not realise there was a Tensorflow implementation of the cosine similarity function "tf.keras.losses.CosineSimilarity"
Here is my code:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
As you can see, I have essentially just swapped out the numpy functions for the TF equivalents. One main point of note is that I had to use "reduction=tf.keras.losses.Reduction.NONE" within the "cosine_sim" function, this was to keep the shapes consistent in the "sim_ik" and "sim_jk", because otherwise the resulting loss did not match up with my original numpy implementation.
I also noticed that individually calculating the numerator for i,j and j,i was redundant as the answers were the same, so I have removed one instance of that calculation.
Of course if anybody has a quicker implementation I am more than happy to hear about it!
Here is a more efficient and more stable implementation. Assuming zi and zj are interlaced!
class NT_Xent(tf.keras.layers.Layer):
""" Normalized temperature-scaled CrossEntropy loss [1]
[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” arXiv. 2020, Accessed: Jan. 15, 2021. [Online]. Available: https://github.com/google-research/simclr.
"""
def __init__(self, tau=1, **kwargs):
super().__init__(**kwargs)
self.tau = tau
self.similarity = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
self.criterion = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
def get_config(self):
return {"tau": self.tau}
def call(self, zizj):
""" zizj is [B,N] tensor with order z_i1 z_j1 z_i2 z_j2 z_i3 z_j3 ...
batch_size is twice the original batch_size
"""
batch_size = tf.shape(zizj)[0]
mask = tf.repeat(tf.repeat(~tf.eye(batch_size/2, dtype=tf.bool), 2, axis=0), 2, axis=1)
sim = -1*self.similarity(tf.expand_dims(zizj, 1), tf.expand_dims(zizj, 0))/self.tau
sim_i_j = -1*self.similarity(zizj[0::2], zizj[1::2])/self.tau
pos = tf.reshape(tf.repeat(sim_i_j, repeats=2), (batch_size, -1))
neg = tf.reshape(sim[mask], (batch_size, -1))
logits = tf.concat((pos, neg), axis=-1)
labels = tf.one_hot(tf.zeros((batch_size,), dtype=tf.int32), depth=batch_size-1)
return self.criterion(labels, logits)
source: https://github.com/gabriel-vanzandycke/tf_layers
This code is built up as follows: My robot takes a picture, some tf computer vision model calculates where in the picture the target object starts. This information (x1 and x2 coordinate) is passed to a pytorch model. It should learn to predict the correct motor activations, in order to get closer to the target. After the movement is executed, the robot takes a picture again and the tf cv model should calculate whether the motor activation brought the robot closer to the desired state (x1 at 10, x2 coordinate at at31)
However every time i run the code pytorch is not able to calculate the gradients.
I'm wondering if this is some data-type problem or if it is a more general one: Is it impossible to calculate the gradients if the loss is not calculated directly from the pytorch network's output?
Any help and suggestions will be greatly appreciated.
#define policy model (model to learn a policy for my robot)
import torch
import torch.nn as nn
import torch.nn.functional as F
class policy_gradient_model(nn.Module):
def __init__(self):
super(policy_gradient_model, self).__init__()
self.fc0 = nn.Linear(2, 2)
self.fc1 = nn.Linear(2, 32)
self.fc2 = nn.Linear(32, 64)
self.fc3 = nn.Linear(64,32)
self.fc4 = nn.Linear(32,32)
self.fc5 = nn.Linear(32, 2)
def forward(self,x):
x = self.fc0(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
return x
policy_model = policy_gradient_model().double()
print(policy_model)
optimizer = torch.optim.AdamW(policy_model.parameters(), lr=0.005, betas=(0.9,0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)
#make robot move as predicted by pytorch network (not all code included)
def move(motor_controls):
#define curvature
# motor_controls[0] = sigmoid(motor_controls[0])
activation_left = 1+(motor_controls[0])*99
activation_right = 1+(1- motor_controls[0])*99
print("activation left:", activation_left, ". activation right:",activation_right, ". time:", motor_controls[1]*100)
#start movement
#main
import cv2
import numpy as np
import time
from torch.autograd import Variable
print("start training")
losses=[]
losses_end_of_epoch=[]
number_of_steps_each_epoch=[]
loss_function = nn.MSELoss(reduction='mean')
#each epoch
for epoch in range(2):
count=0
target_reached=False
while target_reached==False:
print("epoch: ", epoch, ". step:", count)
###process and take picture
indices = process_picture()
###binary_network(sliced)=indices as input for policy model
optimizer.zero_grad()
###output: 1 for curvature, 1 for duration of movement
motor_controls = policy_model(Variable(torch.from_numpy(indices))).detach().numpy()
print("NO TANH output for motor: 1)activation left, 2)time ", motor_controls)
motor_controls[0] = np.tanh(motor_controls[0])
motor_controls[1] = np.tanh(motor_controls[1])
print("TANH output for motor: 1)activation left, 2)time ", motor_controls)
###execute suggested action
move(motor_controls)
###take and process picture2 (after movement)
indices = (process_picture())
###loss=(binary_network(picture2) - desired
print("calculate loss")
print("idx", indices, type(torch.tensor(indices)))
# loss = 0
# loss = (indices[0]-10)**2+(indices[1]-31)**2
# loss = loss/2
print("shape of indices", indices.shape)
array=np.zeros((1,2))
array[0]=indices
print(array.shape, type(array))
array2 = torch.ones([1,2])
loss = loss_function(torch.tensor(array).double(), torch.tensor([[10.0,31.0]]).double()).float()
print("loss: ", loss, type(loss), loss.shape)
# array2[0] = loss_function(torch.tensor(array).double(),
torch.tensor([[10.0,31.0]]).double()).float()
losses.append(loss)
#start line causing the error-message (still part of main)
###calculate gradients
loss.backward()
#end line causing the error-message (still part of main)
###apply gradients
optimizer.step()
#Output (so far as intented) (not all included)
#calculate loss
idx [14. 15.] <class 'torch.Tensor'>
shape of indices (2,)
(1, 2) <class 'numpy.ndarray'>
loss: tensor(136.) <class 'torch.Tensor'> torch.Size([])
#Error Message:
Traceback (most recent call last):
File "/home/pi/Desktop/GradientPolicyLearning/PolicyModel.py", line 259, in <module>
array2.backward()
File "/home/pi/.local/lib/python3.7/site-packages/torch/tensor.py", line 134, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/pi/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in
backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If you call .detach() on the prediction, that will delete the gradients. Since you are first getting indices from the model and then trying to backprop the error, I would suggest
prediction = policy_model(torch.from_numpy(indices))
motor_controls = prediction.clone().detach().numpy()
This would keep the predictions as it is with the calculated gradients that can be backproped.
Now you can do
loss = loss_function(prediction, torch.tensor([[10.0,31.0]]).double()).float()
Note, you might wanna call double of the prediction if it throws an error.
It is indeed impossible to calculate the gradients if the loss is not calculated directly from the PyTorch network's output because then you would not be able to apply the chain rule which is used to optimise the gradients.
simple solution, turn on the Context Manager that sets gradient calculation to ON, if it is off
torch.set_grad_enabled(True) # Context-manager
Make sure that all your inputs into the NN, the output of NN and ground truth/target values are all of type torch.tensor and not list, numpy.array or any other iterable.
Also, make sure that they are not converted to list or numpy.array at any point either.
In my case, I got this error because I performed list comprehension on the tensor containing predicted values from NN. I did this to get the max value in each row. Then, converted the list back to a torch.tensor. before calculating the loss.
This back and forth conversion disables the gradient calculations
In my case, I got past this error by specifying requires_grad=True when defining my input tensors
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('dark_background')
# define rosenbrock function and gradient
a = 1
b = 5
def f(x):
return (a - x[0]) ** 2 + b * (x[1] - x[0] ** 2) ** 2
def jac(x):
dx1 = -2 * a + 4 * b * x[0] ** 3 - 4 * b * x[0] * x[1] + 2 * x[0]
dx2 = 2 * b * (x[1] - x[0] ** 2)
return np.array([dx1, dx2])
# create stochastic rosenbrock function and gradient
def f_rand(x):
return f(x) * np.random.uniform(0.5, 1.5)
def jac_rand(x): return jac(x) * np.random.uniform(0.5, 1.5)
# use hand coded adam
x = np.array([0.1, 0.1])
x0 = x.copy()
j = jac_rand(x)
beta1=0.9
beta2=0.999
eps=1e-8
m = x * 0
v = x * 0
learning_rate = .1
for ii in range(200):
m = (1 - beta1) * j + beta1 * m # first moment estimate.
v = (1 - beta2) * (j ** 2) + beta2 * v # second moment estimate.
mhat = m / (1 - beta1 ** (ii + 1)) # bias correction.
vhat = v / (1 - beta2 ** (ii + 1))
x = x - learning_rate * mhat / (np.sqrt(vhat) + eps)
x -= learning_rate * v
j = jac_rand(x)
print('hand code finds optimal to be ', x, f(x))
# attempt to use pytorch
import torch
x_tensor = torch.tensor(x0, requires_grad=True)
optimizer = torch.optim.Adam([x_tensor], lr=learning_rate)
def closure():
optimizer.zero_grad()
loss = f_rand(x_tensor)
loss.backward()
return loss
for ii in range(200):
optimizer.step(closure)
print('My PyTorch attempt found ', x_tensor, f(x_tensor))
Following worked for me:
loss.requires_grad = True
loss.backward()
I am currently adapting this siamese network in Python with Keras. However, I currently do not understand how the loss works (not the function itself, but which parameters get passed where)
Okay, now step by step how I think this works:
distance = Lambda(euclidean_distance,
output_shape=eucl_dist_output_shape)([processed_a, processed_b])
This is the line where the outputs of both individual networks get combined and the custom layer applies the following functions:
def euclidean_distance(vects):
x, y = vects
sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_square, K.epsilon()))
def eucl_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
So when the input to this layer is (128, 128) the output would be (128, 1). In the last step the loss is calculated with:
def contrastive_loss(y_true, y_pred):
'''Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
'''
margin = 1
square_pred = K.square(y_pred)
margin_square = K.square(K.maximum(margin - y_pred, 0))
return K.mean(y_true * square_pred + (1 - y_true) * margin_square)
Here, the predicted 128D vector is compared to the 128D ground truth vector.
Now I changed the Lambda layer to:
distance = Lambda(euclidean_distance,
output_shape=eucl_dist_output_shape)([processed_a, processed_b, processed_c])
so I have three networks now with the following adapted functions (which should just combine the three outputs to one output, with a shape of (128, 3)):
def euclidean_distance(vects):
return vects
def eucl_dist_output_shape(shapes):
shape1, shape2, shape3 = shapes
return (shape1, shape2, shape3)
and then the new loss function:
def loss_desc_triplet(vects, margin=5):
"""Triplet loss.
"""
d1, d2, d3 = vects
d_pos = K.sqrt(K.sum(K.square(d1 - d2), axis=1))
pair_dist_1_to_3 = K.sqrt(K.sum(K.square(d1 - d3), axis=1))
d_neg = pair_dist_1_to_3
return Activation.relu(d_pos - d_neg + margin)
But now I get this error:
File
"DeepLearningWithAugmentationWithTriplets.py",
line 233, in
output_shape=eucl_dist_output_shape)([processed_a, processed_b, processed_c])
File
"lib/python3.7/site-packages/keras/engine/base_layer.py",
line 497, in call
arguments=user_kwargs)
File
"lib/python3.7/site-packages/keras/engine/base_layer.py",
line 565, in _add_inbound_node
output_tensors[i]._keras_shape = output_shapes[i]
IndexError: list index out of range
But I am not sure what causes this.
I fixed the problem by concatenating the outputs:
merged_vector = concatenate([processed_a, processed_b, processed_c], axis=-1, name='merged_layer')
and then disassembling the vector in my loss function:
d1 = y_pred[:,0:128]
d2 = y_pred[:,128:256]
d3 = y_pred[:,256:384]
Though, I am not sure if that is the best solution.