Pytorch - Optimizer is not updating its specified parameter - python

I'm trying to implement CLIP-based style transfer. The full code is here
For some unknown reason optimizer doesn't change the weights of the latent tensor. I can confirm that the values are equal before and after the iteration steps. I've also made sure that requires_grad is True and tried various loss functions and optimizers.
Any idea why it doesn't work?

I see some problems with your code.
The optimizer takes in parameters. Parameters are supposed to be leaf nodes in your computation graph. In your case, you tell the optimizer to use latent as the parameter, but it must have complained as latent is the result of some computations.
So you detached latent, now latent becomes a leaf node. But when you detach the latent, the computation graph is no longer there, creating a new latent variable.
Also, to optimize a parameter, the loss should be a function of that parameter. I am not able to see if you are using latent in your loss function computation. So that can be another issue.

I think I've found the issue. On line 86, where I compute one-hot vector from latent, in order to decode it and pass it to CLIP, the graph would break. vae_make_onehot returns a leaf tensor

Related

Forward function with multiple outputs?

Typically the forward function in nn.module of pytorch computes and returns predictions for inputs happening in the forward pass. Sometimes though, intermediate computations might be useful to return. For example, for an encoder, one might need to return both the encoding and reconstruction in the forward pass to be used later in the loss.
Question: Can Pytorch's nn.Module's forward function, return multiple outputs? Eg a tuple of outputs consisting predictions and intermediate values?
Does such a return value not mess up the backward propagation or autograd?
If it does, how would you handle cases where multiple functions of input are incorporated in the loss function?
(The question should be valid in tensorflow too.)
"The question should be valid in Tensorflow too", but PyTorch and Tensorflow are different frameworks. I can answer for PyTorch at least.
Yes you can return a tuple containing any final and or intermediate result. And this does not mess up back propagation since the graph is saved implicitly from the tensors outputs using callbacks and cached tensors.

Why does not using retain_graph=True result in error?

If I need to backpropagate through a neural network twice and I don't use retain_graph=True, I get an error.
Why? I realize it is nice to keep the intermediate variables used for the first backpropagation to be reused for the second backpropagation. However, why aren't they simply recalculated, like they were originally calculated in the first backpropagation?
By default, PyTorch doesn't store intermediate gradients, because the PyTorch's main feature is Dynamic Computational Graphs, so after backpropagation the graph will be freed all the intermediate buffers will be destroyed.

No gradients provided for any variable for increasing Loss value inside tf.while_loop

I have a CNN architecture, with consists of some layers -- convolution, fully-connected, and deconvolution -- (called it with first process). The last deconvolution layer gives me the points as the output and I need to do some processing (call it with second process) with this output to get the Loss value.
In the second process, I'm doing the tf.while_loop for calculating the Loss value, because the Loss value is achieved by adding all Loss values from each iteration in tf.while_loop. And I give the tf.constant(0) for the Loss initialization before looping.
When I tried to train and minimize that Loss, it shows me the error of No gradient provided between the output from first process and Loss tensor.
The second process looks like this:
loss = tf.constant(0)
i = tf.constant(0)
def cond(i, loss):
return tf.less(i, tf.size(xy))
def body(i, loss):
# xy is the output from the first process
xy = tf.cast(xy, tf.float32)
x = tf.reduce_mean(xy)
loss = tf.add(loss, x)
return [tf.add(i,1), loss]
r = tf.while_loop(cond, body, [i, loss])
optimizer.minimize(r[1])
And I also do some processing inside the second process which (I read from many posts, especially here) doesn't provide gradient.
Any help would be really appreciated.
There's several reasons why you will get that error. Without actually seeing your original code it might be hard to debug but here's at least two reasons why gradients aren't provided:
There are some tensorflow operations through which gradients cannot flow or back-propagation cannot occur. For example tf.cast or tf.assign etc. In the post that you linked, there's a comment that mentions this. So in the example you provided tf.cast will definitely cause an issue.
A solution to this problem would be to restructure your code in such a way that you don't use tensorflow operations that disallow gradients to pass through them.
A second reason why this might occur is when you try to optimize variables by using a loss that was not calculated on those variables. For example, if you calculated loss in your first process on conv1 variables, and then in your second process you try to update/optimize conv2 variables. This will not work since the gradients will be calculated for conv1 variables and not conv2.
It looks like in your case it is most likely the first issue and not the second one.

What is the meaning of 'self.diff' in 'forward' of a custom python loss layer for Caffe training?

I try to use a custom python loss layer. When I checked several examples online, such as:
Euclidean loss layer, Dice loss layer,
I notice a variable 'self.diff' is always assigned in 'forward'. Especially for the Dice loss layer,
self.diff[...] = bottom[1].data
I wonder if there is any reason that this variable has to be introduced in forward or I can just use bottom[1].data to access ground truth label?
In addition, what is the point of top[0].reshape(1) in reshape, since by definition in forward, the loss output is a scalar itself.
You need to set the diff attribute of the layer for overall consistency and data communication protocol; it's available other places in the class, and anywhere the loss layer object appears. bottom is a local parameter, and is not available elsewhere in the same form.
In general, the code is expandable for a variety of applications and more complex computations; the reshaping is part of this, ensuring that the returned value is scalar, even if someone expands the inputs to work with vectors or matrices.

Train a feed forward neural network indirectly

I am faced with this problem:
I have to build an FFNN that has to approximate an unknown function f:R^2 -> R^2. The data in my possession to check the net is a one-dimensional R vector. I know the function g:R^2->R that will map the output of the net into the space of my data. So I would use the neural network as a filter against bias in the data. But I am faced with two problems:
Firstly, how can I train my network in this way?
Secondly, I am thinking about adding an extra hidden layer that maps R^2->R and lets the net train itself to find the correct maps and then remove the extra layer. Would this algorithm be correct? Namely, would the output be the same that I was looking for?
Your idea with additional layer is good, although the problem is, that your weights in this layer have to be fixed. So in practise, you have to compute the partial derivatives of your R^2->R mapping, which can be used as the error to propagate through your network during training. Unfortunately, this may lead to the well known "vanishing gradient problem" which stopped the development of NN for many years.
In short - you can either manually compute the partial derivatives, and given expected output in R, simply feed the computed "backpropagated" errors to the network looking for R^2->R^2 mapping or as you said - create additional layer, and train it normally, but you will have to make the upper weights constant (which will require some changes in the implementation).

Categories

Resources