I am trying to understand how Keras actually computes the gradients of a custom loss in a general setting.
Normally losses are defined as a sum over the samples of independent contributions. This allows eventually a proper parallelisation in the computation of the gradients.
However, if I add a global non linearity on top of it, thus coupling the contribution of the individual samples, is Keras able to treat the differentiation properly?
In practice, is it actually minimising f(sum_i(x_i)) or computes it one sample at the time and thus reducing to sum_i(f(x_i))?
Below an example in the case of a log function.
def custom_loss(y_true,y_pred):
return K.log(1+K.mean((y_pred-y_true)*(y_pred-y_true)))
I have checked for documentation but I couldn't find any precise answer.
It minimizes whatever you tell it to minimize.
If you want to minimize the log of the whole sum, then apply the log after the sum.
If you want to minimize the log of each sample and sum later, then apply the log before the sum
def log_of_sum(y_true, y_pred):
return K.log(1 + K.mean(K.square(y_true-y_pred)))
def sum_of_logs(y_true, y_ored):
return K.mean(K.log(1 + K.square(y_true-y_pred)))
#mean is optional here - you can return all the samples and Keras will handle it
#returning all the samples allows other functions to work, like sample_weights
Related
I have two parameters which I want a neural network to predict. What is the best or most conventional method to implement the loss function? Currently I just define the loss, torch.nn.L1Loss(), which automatically computes the mean for both parameters such that it becomes a scalar.
Another plausible method would be to create two loss functions, one for each parameter, and successively backpropagate.
I don't really see whether both methods compute the same thing and whether one method is better (or plain wrong).
The probelm could be seen as a Multi-task Probelm. For example, two parameters represents A-Task and B-Task respectively.
In Multi-task, two loss function is often used.
The usual form is as follows,
$$total_loss = \alpha * A_losss(\hat{y_1},y_1) + \bata * A_losss(\hat{y_2},y_2)$$
The $\alpha$ and $\beta$ is the weight of the loss function.Usually they are both 1 or 0.5.
I have asked a similar question but no response. So I try it again,
I am reading a paper which suggest to add some value which is calculated outside of Tensorflow into the loss function of a neural network model in Tensorflow. i show you the quote here (I have blurred the not important part):
How do I add a precalculated value to the loss function when fitting a sequential Model in Tensorflow?
The Loss function used is BinaryCrossentropy, you can see it in the equation (4) in the paper quote. And the value added is shown in the quote but it is not important for the question i think.
It is also not important how my model looks like, i just want to add a constant value to my loss function in tensorflow when fitting my model.
Thank you very much!!
In the equation above as you can see, there can a chance when the outcome is very low i.e. the problem of vanishing gradient may occur.
In order to alleviate that, they are asking to add a constant value to the loss.
Now, you can a simple constant such 1, 10 or anything, or by something proportional to what they have said.
You can easily calculate the expectation from the ground truth for one part. The other part is the tricky one as you won't have values until you train and calculating them on the fly is not wise.
That term means how much difference between the ground truth and predictions will be there.
So, if you are going to implement this paper, then, add a constant value of 1 to your loss, so it doesn't vanish.
It seems that you want to be able to define your own loss. Also, I am not sure whether you use actual Tensorflow or Keras. Here is a solution with Keras:
import tensorflow.keras.backend as K
def my_custom_loss(precomputed_value):
def loss(y_true, y_pred):
return K.binary_crossentropy(y_true, y_pred) + precomputed_value
return loss
my_model = Sequential()
my_model.add(...)
# Add any layer there
my_model.compile(loss=my_custom_loss(42))
Inspired from https://towardsdatascience.com/advanced-keras-constructing-complex-custom-losses-and-metrics-c07ca130a618
EDIT: The answer was only for adding a constant term, but I realize that the term suggested in the paper is not constant.
I haven't read the paper, but I suppose from the cross-entropy definition that sigma is the ground truth and p is the predicted value. If there are no other dependency, the solution can even be simpler:
def my_custom_loss(y_pred, y_true):
norm_term = K.square( K.mean(y_true) - K.mean(y_pred) )
return K.binary_crossentropy(y_true, y_pred) + norm_term
# ...
my_model.compile(loss=my_custom_loss)
Here, I assumed the expectations are only computed on each batch. Tell me whether it is what you want. Otherwise, if you want to compute your statistics at a different scale, e.g. on the whole dataset after every epoch, you might need to use callbacks.
In that case, please give more precision on your problem, adding for instance a small example for y_pred and y_true, and the expected loss.
I am learning how to use Tensorflow and at this 1 particular point I am really stuck and can not make a sense around it. Imagine I have a 5 layer network and the output is represented by output. Now suppose I want to find the gradient of output with respect to layer_2. For that purpose, the code I will write in Tensorflow will be something like:
gradients_i_want = tf.gradients(output, layer_2)
Theoretically, this gradient should be calculated via chain rule. I want to ask, that whether Tensorflow calculates these gradients via chain rule or it will just take the derivative of output with respect to layer_2
Tensorflow will create a graph for your model, where each node is an operation (e.g. addition, multiplication, or a combination of them). Basic ops have manually defined gradient functions, and those functions will be used when applying the chain rule while traveling backwards through the graph.
If you write your own custom op, you might need to also write the corresponding gradient function.
I am trying to modify a bit the loss function of my convent and I have some questions from the implementation side.
I already know how to create a custom loss function in Keras, and how to call it. But I still do not have clear where to include the derivative of the function.
Let's say that my new loss function is:
Loss = cross-entropy + f(x)
where f(x) = x**2.
Where should I include f'(x)=2x so that it is used in the back-prop step?
Does Keras automatically do that? Or should I define this explicitly in some part?
Thanks for any hint on this, since I do not know how to do it.
Chuan.
Loss must be a function of a) your networks output and b) correct labels.
Having loss = Summ(a,b) makes your network minimize both a) and b).
minimizing x**2 brings x close to zero;
minimizing softmax().. since softmax(x) is not a loss function, is defined only for a vector X, and helps make a vector summ up to 1, you cant really minimize it. I guess you are mixing concepts here.
Softmax is an activation function, and its output can be used to compute loss, eg. logloss
I am trying to define a custom loss function in Keras where I have an additional term that is an integral over the domain of the neural network output. So this would look like:
The key point is that the integral runs over an entire domain that I've specified, not just training data. I don't mind using any form of quadrature to evaluate the integral, I just need to be able to evaluate it. Currently, as far as the documentation indicates, this is not possible to do with a custom loss as it only provides access to y_pred and y_true.
Is there any way of achieving this in Keras?
If the idea is just defining extra variables, you can do this either inside (locally) or outside (globally) the loss function, using keras backend functions:
import keras.backend as K
myDomain = K.variable(range(100)) / 10 #for instance
def custom_loss(y_true,y_pred):
localVar = K.variable([[1,2],[3,1]])
return calculationsWith(y_true,y_pred,localVar,myDomain)
It's important that you use functions coming from the backend to do the calculations. (Either from K or directly from tensoflow, theano or CNTK).