How to add multiple losses into gradienttape - python

I am testing tf.gradienttape. I wrote a model with several output layers, each with an own loss, where i wanted to integrate the gradienttape. My question is: are there specific techniques how to implement the several losses to the gradient as target?
I know one option is to take the mean of the losses. Is that always necessary? Can't I just input a list of losses and the gradienttape knows which losses belong to which output layer?

In the TensorFlow document: Unless you set persistent=True a GradientTape can only be used to compute one set of gradients.
To calculate multiple losses, you need multiple tapes. Something like:
with tf.GradientTape() as t1:
loss1_result= loss1(true, pred)
grads1 = t1.gradient(loss1_result, var_list1)
with tf.GradientTape() as t2:
loss2_result= loss2(true, pred)
grads2 = t2.gradient(loss2_result, var_list2)
Then apply it.
opt1.apply_gradients(zip(grads1, var_list1))
opt2.apply_gradients(zip(grads2, var_list2))

Related

How should I keep track of total loss while training a network with a batched dataset?

I am attempting to train a discriminator network by applying gradients to its optimizer. However, when I use a tf.GradientTape to find the gradients of loss w.r.t training variables, None is returned. Here is the training loop:
def train_step():
#Generate noisy seeds
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as disc_tape:
pattern = generator(noise)
pattern = tf.reshape(tensor=pattern, shape=(28,28,1))
dataset = get_data_set(pattern)
disc_loss = tf.Variable(shape=(1,2), initial_value=[[0,0]], dtype=tf.float32)
disc_tape.watch(disc_loss)
for batch in dataset:
disc_loss.assign_add(discriminator(batch, training=True))
disc_gradients = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
Code Description
The generator network generates a 'pattern' from noise. I then generate a dataset from that pattern by applying various convolutions to the tensor. The dataset that is returned is batched, so I iterate through the dataset and keep track of the loss of my discriminator by adding the loss from this batch to the total loss.
What I do know
tf.GradientTape returns None when there is no graph connection between the two variables. But isn't there a graph connection between loss and trainable variables? I believe my mistake has something to do with how I keep track of loss in the disc_loss tf.Variable
My Question
How do I keep track of loss while iterating through a batched dataset so that I may use it later to calculate gradients?
The base answer here is that the assign_add function of tf.Variable is not differentiable, thus no gradient can be calculated between the variable disc_loss and the discriminator trainable variables.
In this very specific case, the answer was
disc_loss = disc_loss + discriminator(batch, training=True)
In future cases of similar problems, be sure to check that all operations used while being watched by the gradient tape are differentiable.
This link has a list of differentiable and non-differentiable tensorflow ops. I found it very useful.

Training a model with single output on multiple losses keras

I am building an image segmentation model using keras and I want to train my model on multiple loss functions. I have seen this link but I am looking for a simpler and straight-forward solutions for this situation as my loss functions are quite complex. Can someone tell me how to build a model with single output with multiple losses in keras.
You can use multiple losses with one output using weighted loss, which is a sum of your losses multiplied by weight. Create your custom loss which will return a sum of other losses with coefficients and pass it to model.compile. There is an example here.
This is just an example from here. You could play around with it.
def custom_losses(y_true, y_pred):
alpha = 0.6
squared_difference = tf.square(y_true - y_pred)
Huber = tf.keras.losses.huber(y_true, y_pred)
return tf.reduce_mean(squared_difference, axis=-1) + (alpha*Huber)
model.compile(optimizer='adam', loss=custom_losses,metrics=['MeanSquaredError'])

Approximation of funtion with multi-dimensional output using a keras neural network

As part of a project for my studies I want to try and approximate a function f:R^m -> R^n using a Keras neural network (to which I am completely new). The network seems to be learning to some (indeed unsatisfactory) point. But the predictions of the network don't resemble the expected results in the slightest.
I have two numpy-arrays containing the training-data (the m-dimensional input for the function) and the training-labels (the n-dimensional expected output of the function). I use them for training my Keras model (see below), which seems to be learning on the provided data.
inputs = Input(shape=(m,))
hidden = Dense(100, activation='sigmoid')(inputs)
hidden = Dense(80, activation='sigmoid')(hidden)
outputs = Dense(n, activation='softmax')(hidden)
opti = tf.keras.optimizers.Adam(lr=0.001)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=opti,
loss='poisson',
metrics=['accuracy'])
model.fit(training_data, training_labels, verbose = 2, batch_size=32, epochs=30)
When I call the evaluate-method on my model with a set of test-data and a set of test-labels, I get an apparent accuracy of more than 50%. However, when I use the predict method, the predictions of the network do not resemble the expected results in the slightest. For example, the first ten entries of the expected output are:
[0., 0.08193582, 0.13141066, 0.13495408, 0.16852582, 0.2154705 ,
0.30517559, 0.32567417, 0.34073457, 0.37453226]
whereas the first ten entries of the predicted results are:
[3.09514281e-09, 2.20849714e-03, 3.84095078e-03, 4.99367528e-03,
6.06226595e-03, 7.18442770e-03, 8.96730460e-03, 1.03423093e-02, 1.16029680e-02, 1.31887039e-02]
Does this have something to do with the metrics I use? Could the results be normalized by Keras in some intransparent way? Have I just used the wrong kind of model for the problem I want to solve? What does 'accuracy' mean anyway?
Thank you in advance for your help, I am new to neural networks and have been stuck with this issue for several days.
The problem is with this line:
outputs = Dense(n, activation='softmax')(hidden)
We use softmax activation only in a classification problem, where we need a probability distribution over the classes as an output of the network. And so softmax makes ensures that the output sums to one and non zero (which is true in your case). But I don't think the problem at hand for you is a classification task, you are just trying to predict ten continuous target varaibles, so use a linear activation function instead. So modify the above line to something like this
outputs = Dense(n, activation='linear')(hidden)

Implementing Intersection over Union Loss Using Tensorflow

This may be more of a Tensorflow gradient question. I have been attempting to implement Intersection over Union (IoU) as losses and have been running into some problems. To the point, here is the snippet of my code that computes the IoU:
def get_iou(masks, predictions):
ious = []
for i in range(batch_size):
mask = masks[i]
pred = predictions[i]
masks_sum = tf.reduce_sum(mask)
predictions_sum = tf.reduce_mean(pred)
intersection = tf.reduce_sum(tf.multiply(mask, pred))
union = masks_sum + predictions_sum - intersection
iou = intersection / union
ious.append(iou)
return ious
iou = get_iou(masks, predictions)
mean_iou_loss = -tf.log(tf.reduce_sum(iou))
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
It works as predicted. However, the issue that I am having is the losses do not decrease. The model does train, though the results are less than ideal so I am wondering if I am implementing it correctly. Do I have to compute the gradients myself? I can compute the gradients for this IoU loss derived by this paper using tf.gradients(), though I am not sure how to incorporate that with the tf.train.AdamOptimizer(). Reading the documentation, I feel like compute_gradients and apply_gradients are the commands that I need to use, but I can't find any examples on how to use them. My understanding is that the Tensorflow graph should be able to come up with the gradient itself via chain rule. So is a custom gradient even necessary in this problem? If the custom gradient is not necessary then I may just have an ill-posed problem and need to adjust some hyperparameters.
Note: I have tried Tensorflow's implementation of the IoU, tf.metrics.mean_iou(), but it spits out inf every time so I have abandoned that.
Gradient computation occurs inside optimizer.minimize function, so, no explicit use inside loss function is needed. However, your implementation simply lacks an optimizable, trainable variable.
iou = get_iou(masks, predictions)
mean_iou_loss = tf.Variable(initial_value=-tf.log(tf.reduce_sum(iou)), name='loss', trainable=True)
train_op = tf.train.AdamOptimizer(0.001).minimize(mean_iou_loss)
Numerical stability, differentiability and particular implementation aside, this should be enough to use it as a loss function, which will change with iterations.
Also take a look:
https://arxiv.org/pdf/1902.09630.pdf
Why does one not use IOU for training?

What is the y_pred and y_true when calculating loss in a multi output keras model?

I currently have a multi output model
model=Model(inputs=x1, outputs=[y1,y2])
model.compile((optimizer='sgd', loss=[cutom_loss,'mse'])
What is the y_pred and y_true values here for mse loss function? What is the y_true for mse; is it output of y2 alone or its both y1 and y2?
In my custom_loss I need to pass y_true and y_pred from both the outputs sepeartaly for calculation
def custom_loss(y1_true, y1_pred,y2_true, y2_pred):
How can I do this?
Unfortunately you cannot define a 'global' loss function.
A loss function is always computed only on one output (see the pseudo-code in the accepted answer).
In your example the custom loss will be computed on y1_true and y1_pred, while the mse will be computed on y2_true and y2_pred.
If you want a custom loss that includes both y1 and y2 outputs, I can think of two ways:
Collapse multiple outputs in one output: if y1 and y2 are similar vectors, you could concatenate them in order to have only one output. Then in your custom loss you apply some indexing/slicing in order to separate the two outputs.
Make the loss an output of your model: create a custom network graph
(using keras functional API and the backend) that computes the loss,
by taking y1_true and y2_true as an input to the network; by doing
that, your final model will have 3 outputs; y1_pred, y2_pred and the
loss. After the training you can discard the part of the model your
are not interested anymore (y_true inputs and loss output).
I remember that I had a similar problem in the past and I chose to implement option 2, but it was kind of a pain.

Categories

Resources