Pytorch custom/modified GAN loss equivalent - python

I'm trying to implement a Pytorch version of Creative Adversarial Networks, a GAN with a modified/custom loss function.
Here are the formulae for the loss function. I'm using Pytorch's nn.CrossEntropyLoss for the discriminator's modified loss function, and it seems to be working, as its loss decreases over epochs, but I don't think nn.CrossEntropyLoss is suitable for the generator, as nn.CrossEntropyLoss seems to expect Long and not Float tensors, and the paper’s loss function, particularly the generator's loss, seems to me like it would require floats.
This is my current (initial) thinking for the generator's custom loss:
y_dim is the number of classes
disc_class_layer = FC layer that outputs a style/class given an input image
The for loop attempts to be the equivalent to:
(sigma k=1 up to k) ((1/K)log(Dc(ck|G(z)) + (1 − (1/K)log(1 − Dc(ck|G(z)).
class CanGLoss(nn.Module):
def __init__(self,y_dim,labels,disc_class_layer):
super(CanGLoss,self).__init__()
def forward(self,inp):
style_loss = 0
for i in range(1,y_dim+1):
style_loss += (1/i)*torch.log(disc_class_layer(inp)) + (1 - (1/i))*torch.log(1-disc_class_layer(inp))
return style_loss*-1
Is this on the right track? I am new to custom loss functions and Pytorch and not sure this is the way to go.
Any help would be great!

Related

Is there a faster way to compute gradients of output wrt inputs in keras/tensorflow (graph mode)?

It seems that the standard way to compute the gradient of the output of a keras model with respect to the input variables (for example, see How to compute gradient of output wrt input in Tensorflow 2.0) is something like the following:
with tf.GradientTape() as tape:
preds = model(input)
grads = tape.gradient(preds, input)
However, this is extremely slow when the input tensor is large (e.g. ten million observations of 500 input variables). The above code also does not seem to use the GPU at all.
When training the model using model.fit(input), it runs on the GPU and is super fast, despite the large input tensor.
Is there any way to speed up the gradient calculation?
About version
I am running Python 3.8 and Tensorflow v2.9.1. For various reasons I can only run in graph mode--i.e., tf.compat.v1.disable_eager_execution().
Thanks in advance!
The problem is that you are not handling batches, or at least this is what I understand from the info you have given.
According to the fit() documentation, the function takes an argument batch_size, which defaults to 32:
batch_size: Integer or None. Number of samples per gradient update. If
unspecified, batch_size will default to 32
However with gradient tape, you have to manually handle batches. The input, in your code must be a batch.
This means you should have something like the following code:
for epoch in range(epochs):
# Iterate over the batches of the dataset.
for batch in range(num_batch):
images = x_train[batch * batch_size: (batch + 1) * batch_size]
labels = y_train[batch * batch_size: (batch + 1) * batch_size]
# calling the tape on single batch
step(model, images)
#tf.function
def step(model, x):
with tf.GradientTape() as tape:
preds = model(x)
grads = tape.gradient(preds, x)
Also in order to improve performances, I've wrapped the gradient tape inside a tf.function. This decorator is basically responsible, on first call, for compiling a static graph of the operations inside the function that it decorates. This way subsequent calls can be a lot faster. Here to know more about better performance with tf.function.

Simple L1 loss in PyTorch

I want to calculate L1 loss in a neural network, I came across this example at https://discuss.pytorch.org/t/simple-l2-regularization/139/2, but there are some errors in this code.
Is this really how to calculate L1 Loss in a NN or is there a simpler way?
l1_crit = nn.L1Loss()
reg_loss = 0
for param in model.parameters():
reg_loss += l1_crit(param)
factor = 0.0005
loss += factor * reg_loss
Is this equivalent in any way to simple doing:
loss = torch.nn.L1Loss()
I assume not, because I am not passing along any network parameters. Just checking if there isn existing function to do this.
If I am understanding well, you want to compute the L1 loss of your model (as you say in the begining). However I think you might got confused with the discussion in the pytorch forum.
From what I understand, in the Pytorch forums, and the code you posted, the author is trying to normalize the network weights with L1 regularization. So it is trying to enforce that weights values fall in a sensible range (not too big, not too small). That is weights normalization using L1 normalization (that is why it is using model.parameters()). Normalization takes a value as input and produces a normalized value as output.
Check this for weights normalization: https://pytorch.org/docs/master/generated/torch.nn.utils.weight_norm.html
On the other hand, L1 Loss it is just a way to determine how 2 values differ from each other, so the "loss" is just measure of this difference. In the case of L1 Loss this error is computed with the Mean Absolute Error loss = |x-y| where x and y are the values to compare. So error compute takes 2 values as input and produces a value as output.
Check this for loss computing: https://pytorch.org/docs/master/generated/torch.nn.L1Loss.html
To answer your question: no, the above snippets are not equivalent, since the first is trying to do weights normalization and the second one, you are trying to compute a loss. This would be the loss computing with some context:
sample, target = dataset[i]
target_predicted = model(sample)
loss = torch.nn.L1Loss()
loss_value = loss(target, target_predicted)

Slightly adapt L1 loss to a weighted L1 loss in Pytorch, does gradient computation still work properly?

I implemented a neural network in Pytorch and I would like to use a weighted L1 loss function to train the network.
The implementation with the regular L1 loss contains this code for each epoch:
optimiser.zero_grad()
net.train()
_,forecast = net(torch.tensor(feature, dtype=torch.float).to(DEVICE))
loss = F.l1_loss(forecast, torch.tensor(target,dtype=torch.float).to(DEVICE),reduction='mean')
loss.backward()
params.append(net.parameters())
optimiser.step()
Now I want to use a weighted L1 loss instead. So I thought to use the same standard Pytorch L1 function again and rescale the forecasts and targets with weights. Will the gradient computation still be done correctly?
optimiser.zero_grad()
net.train()
_,forecast = net(torch.tensor(feature, dtype=torch.float).to(DEVICE))
loss = F.l1_loss(torch.t(torch.mul(torch.t(forecast),
torch.tensor(weight,dtype=torch.float).to(DEVICE))) ,
torch.t(torch.mul(torch.t(torch.tensor(target,dtype=torch.float).to(DEVICE)),
torch.tensor(weight,dtype=torch.float).to(DEVICE))),reduction='mean')
loss.backward()
params.append(net.parameters())
optimiser.step()
Yes, it will be correct.
If you are not using in-place operations, the gradients will be computed correctly. Besides, in the current version of Pytorch, there will be an error reported if you accidentally involve some in-place operations into your program.
Here is a related discussion. You can find more information here.

Incorporate side conditions into Keras neural network

I want to train my neural network (in Keras) with an additional condition on the output elements.
An example:
Minimize my loss function MSE between network output y_pred and y_true.
Additionally, ensure that the norm of y_pred is less or equal 1.
Without the condition, the task is straightforward.
Note: The condition is not necessarily the vector norm of y_pred.
How can I implement the additional condition/restriction in a Keras (or maybe Tensorflow) model?
In principle, tensorflow (and keras) don't allow you to add hard constraints to your model.
You have to convert your invarient (norm <= 1) to a penalty function, which is added to the loss. This could look like this:
y_norm = tf.norm(y_pred)
norm_loss = tf.where(y_norm > 1, y_norm, 0)
total_loss = mse + norm_loss
Look at the docs of where. If your prediction has a norm bigger than one, backpropagation tries to minimize the norm. If it is less than or equal, this part of the loss is simply 0. No gradient is produced.
But this can be very hard to optimize. Your predictions could oscillate around a norm of 1. It is also possible to add a factor: total_loss = mse + 1000* norm_loss. Be very careful with this, it makes optimization even harder.
In the example above, the norm above one contributes linearly to the loss. This is called l1-regularization. You could also square it, which would become l2-regularization.
In your specific case, you could get creative. Why not normalize your predictions and the targets to one (just a suggestion, might be a bad idea)?
loss = mse(y_pred / tf.norm(y_pred), y_target / np.linalg.norm(y_target)

Compute gradients for each time step of tf.while_loop

Given a TensorFlow tf.while_loop, how can I calculate the gradient of x_out with respect to all weights of the network for each time step?
network_input = tf.placeholder(tf.float32, [None])
steps = tf.constant(0.0)
weight_0 = tf.Variable(1.0)
layer_1 = network_input * weight_0
def condition(steps, x):
return steps <= 5
def loop(steps, x_in):
weight_1 = tf.Variable(1.0)
x_out = x_in * weight_1
steps += 1
return [steps, x_out]
_, x_final = tf.while_loop(
condition,
loop,
[steps, layer_1]
)
Some notes
In my network the condition is dynamic. Different runs are going to run the while loop a different amount of times.
Calling tf.gradients(x, tf.trainable_variables()) crashes with AttributeError: 'WhileContext' object has no attribute 'pred'. It seems like the only possibility to use tf.gradients within the loop is to calculate the gradient with respect to weight_1 and the current value of x_in / time step only without backpropagating through time.
In each time step, the network is going to output a probability distribution over actions. The gradients are then needed for a policy gradient implementation.
You can't ever call tf.gradients inside tf.while_loop in Tensorflow based on this and this, I found this out the hard way when I was trying to create conjugate gradient descent entirely into the Tensorflow graph.
But if I understand your model correctly, you could make your own version of an RNNCell and wrap it in a tf.dynamic_rnn, but the actual cell
implementation will be a little complex since you need to evaluate a condition dynamically at runtime.
For starters, you can take a look at Tensorflow's dynamic_rnn code here.
Alternatively, dynamic graphs have never been Tensorflow's strong suite, so consider using other frameworks like PyTorch or you can try out eager_execution and see if that helps.

Categories

Resources