Forward function with multiple outputs? - python

Typically the forward function in nn.module of pytorch computes and returns predictions for inputs happening in the forward pass. Sometimes though, intermediate computations might be useful to return. For example, for an encoder, one might need to return both the encoding and reconstruction in the forward pass to be used later in the loss.
Question: Can Pytorch's nn.Module's forward function, return multiple outputs? Eg a tuple of outputs consisting predictions and intermediate values?
Does such a return value not mess up the backward propagation or autograd?
If it does, how would you handle cases where multiple functions of input are incorporated in the loss function?
(The question should be valid in tensorflow too.)

"The question should be valid in Tensorflow too", but PyTorch and Tensorflow are different frameworks. I can answer for PyTorch at least.
Yes you can return a tuple containing any final and or intermediate result. And this does not mess up back propagation since the graph is saved implicitly from the tensors outputs using callbacks and cached tensors.

Related

Difference between calling the .fit() method with .fit(x_train,y_train,[...]) and .fit(train_dataset,[...])

I can see, from the Keras documentation, that the first parameter of the fit method can be, among others:
a numpy array
a tf.data.Dataset
In case the Dataset is used, the second parameter (y) is not to be used.
The first way is pretty clear: I indicate the inputs and the labels explicitly.
I'm having trouble understanding in the second way, how can tensorflow understand which "field" is the label and which fields are the inputs.
I saw in many examples that the map function can accept a function that returns a tuple (input,label) but can also be used with only a value returned.
Is there a way, for example using from_tensor_slices, to correctly indicate where is the label, when creating a Dataset?
Thank you very much
Technically, this depends on the model. A Keras model has a train_step method which takes a single input data (one batch) and runs one step of training (computing outputs, computing loss, computing & applying gradients, computing metrics). The default implementation uses
x, y = data
where x is used as input and y is used as target.
This implies that your dataset should return the same format: Each batch should be a tuple (input, target). You can achieve this by creating a dataset as such:
data = tf.data.Dataset.from_tensor_slices((inputs, labels))

Pytorch - Optimizer is not updating its specified parameter

I'm trying to implement CLIP-based style transfer. The full code is here
For some unknown reason optimizer doesn't change the weights of the latent tensor. I can confirm that the values are equal before and after the iteration steps. I've also made sure that requires_grad is True and tried various loss functions and optimizers.
Any idea why it doesn't work?
I see some problems with your code.
The optimizer takes in parameters. Parameters are supposed to be leaf nodes in your computation graph. In your case, you tell the optimizer to use latent as the parameter, but it must have complained as latent is the result of some computations.
So you detached latent, now latent becomes a leaf node. But when you detach the latent, the computation graph is no longer there, creating a new latent variable.
Also, to optimize a parameter, the loss should be a function of that parameter. I am not able to see if you are using latent in your loss function computation. So that can be another issue.
I think I've found the issue. On line 86, where I compute one-hot vector from latent, in order to decode it and pass it to CLIP, the graph would break. vae_make_onehot returns a leaf tensor

Tensorflow 2.0: Accessing a batch's tensors from a callback

I'm using Tensorflow 2.0 and trying to write a tf.keras.callbacks.Callback that reads both the inputs and outputs of my model for the batch.
I expected to be able to override on_batch_end and access model.inputs and model.outputs but they are not EagerTensor with a value that I could access. Is there anyway to access the actual tensors values that were involved in a batch?
This has many practical uses such as outputting these tensors to Tensorboard for debugging, or serializing them for other purposes. I am aware that I could just run the whole model again using model.predict but that would force me to run every input twice through the network (and I might also have non-deterministic data generator). Any idea on how to achieve this?
No, there is no way to access the actual values for input and output in a callback. That's not just part of the design goal of callbacks. Callbacks only have access to model, args to fit, the epoch number and some metrics values. As you found, model.input and model.output only points to the symbolic KerasTensors, not actual values.
To do what you want, you could take the input, stack it (maybe with RaggedTensor) with the output you care about, and then make it an extra output of your model. Then implement your functionality as a custom metric that only reads y_pred. Inside your metric, unstack the y_pred to get the input and output, and then visualize / serialize / etc. Metrics
Another way might be to implement a custom Layer that uses py_function to call a function back in python. This will be super slow during serious training but may be enough for use during diagnostic / debugging.

How to make sure Tensorflow's backpropagation works?

I wrote a custom layer that is part of a neural network and it contains some operations that I am using for the first time such as tf.scan and tf.slice.
I can easily test that the forward pass works and it makes sense, but how do I know that it will still work during the learning, when it has to do backpropagation? Can I safely assume that everything is going to be fine because the results I get make sense in the forward pass?
I was thinking that one possibility might be to create a neural network, replace one or two layers with the custom ones I have just created, train it, and see what happens. However, despite this would take quite a long time, the network may learn in the other layers whereas in my custom layer it may not work well anyway.
In conclusion, is there any way I can see that back-propagation will work well and I won't have any problems during the learning in this layer?
As far as I know, almost all TensorFlow ops are differentiable, including ops such as tf.abs or tf.where and gradient flows correctly through them. TensorFlow has an automatic differentiation engine, that takes any TensorFlow graph and computes derivatives w.r.t. desired variables.
So if your graph is composed of TensorFlow ops I wouldn't worry about the gradients being wrong (if you would post the code of your layer, I could expand further). However, there are still issues like numerical stability which can make otherwise mathematically sound operation still fail in practice (e.g. naive softmax computation, or tf.exp in your graph in general). Apart from that, TensorFlow differentiation should be correct and taken care of, from the user's point of view.
If you still want to examine your gradients by hand, you can compute the derivatives in your graph using tf.gradients op, which will get you the gradients that you wish and you can check by hand if TensorFlow did the differentiation correctly. (See https://www.tensorflow.org/api_docs/python/tf/gradients)

What is the meaning of 'self.diff' in 'forward' of a custom python loss layer for Caffe training?

I try to use a custom python loss layer. When I checked several examples online, such as:
Euclidean loss layer, Dice loss layer,
I notice a variable 'self.diff' is always assigned in 'forward'. Especially for the Dice loss layer,
self.diff[...] = bottom[1].data
I wonder if there is any reason that this variable has to be introduced in forward or I can just use bottom[1].data to access ground truth label?
In addition, what is the point of top[0].reshape(1) in reshape, since by definition in forward, the loss output is a scalar itself.
You need to set the diff attribute of the layer for overall consistency and data communication protocol; it's available other places in the class, and anywhere the loss layer object appears. bottom is a local parameter, and is not available elsewhere in the same form.
In general, the code is expandable for a variety of applications and more complex computations; the reshaping is part of this, ensuring that the returned value is scalar, even if someone expands the inputs to work with vectors or matrices.

Categories

Resources