When should tf.losses.add_loss() be used in TensorFlow?

When should tf.losses.add_loss() be used in TensorFlow? - python

I cannot find an answer to this question in the TensorFlow documentation. I once read that one should add losses from tf.nn functions but it isn't necessary for functions from tf.losses. Therefore:
When should I use tf.losses.add_loss()?
Example:
loss = tf.reduce_mean(tf.nn.sparse_softmax_corss_entropy_with_logits
(labels=ground_truth, logits=predictions))
tf.losses.add_loss(loss) <-- when is this required?
Thank yoou.

One would use this method to register the loss defined by user.
Namely, if you have created a tensor that defines your loss, for example as my_loss = tf.mean(output) you can use this method to add it to loss collection. You might want to do that if you are not tracking all your losses manually. For example if you are using a method like tf.losses.get_total_loss().
Inside tf.losses.add_loss is very much straightforward:
def add_loss(loss, loss_collection=ops.GraphKeys.LOSSES):
if loss_collection and not context.executing_eagerly():
ops.add_to_collection(loss_collection, loss)

Related

optimizer.step() Not updating Model Weights/Parameters

I'm currently working on a solution via PyTorch. I'm not going to share the exact solution but I will provide code that reproduces the issue I'm having.
I have a model defined as follows:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(10,4)
def foward(self,x):
return nn.functional.relu(self.fc1(x))
Then I create a instance: my_model = Net(). Next I create an Adam optimizer as such:
optim = Adam(my_model.parameters())
# create a random input
inputs = torch.tensor(np.array([1,1,1,1,1,2,2,2,2,2]),dtype=torch.float32,requires_grad=True)
# get the outputs
outputs = my_model(inputs)
# compute gradients / backprop via
outputs.backward(gradient=torch.tensor([1.,1.,1.,5.]))
# store parameters before optimizer step
before_step = list(my_model.parameters())[0].detach().numpy()
# update parameters via
optim.step()
# collect parameters again
after_step = list(my_model.parameters())[0].detach().numpy()
# Print if parameters are the same or not
print(np.array_equal(before_step,after_step)) # Prints True
I provided my models parameters to the Adam optimizer, so I'm not exactly sure why the parameters aren't updating. I know in most cases one uses a loss function, however I cannot do that in my case but I assumed if I specified model paramters to the optimizers, it would know to connect the two.
Anyone know why the parameters aren't getting updated?

The problem is with detach (docs).
As noted at the bottom:
Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks
So that is exactly what's happening here. To correctly compare the parameters, you need to clone (docs) them to get a real copy.
list(my_model.parameters())[0].clone().detach().numpy()
On a side note, it can be helpful if you check the gradients after optim.step() with print(list(my_model.parameters())[0].grad) to check if the graph is intact. Also, don't forget to call optim.zero_grad().

Custom reduction of losses within each batch in Keras

I am using keras for tensorflow in Python. I have a custom loss function that returns a single number for each sample in a batch (so a vector with length = batch size). How can I also specify a custom reduction method to aggregate these sample losses into a single loss for the entire batch? Is it acceptable to include this reduction within the custom loss function and have this function return just a single scalar rather than a vector of losses?

It really depends on your application and goal. A very common approach is to perform a reduce_mean over the loss generated on batch size. Some also use reduce_sum, which of course makes the loss value to depend on the batch size. A general (and maybe unnecessarily complicated) approach could be to use a function to call your desired function, which reduces the batch loss to a single value. Let's call it reducer. In your loss function, in the last line, you can call it right before return:
class my_loss(keras.losses.Loss):
def __init__(self, inputs)
# a bunch of assignments
self.reducer = self._get_reducer_function(inputs) (or a normal mean function)
def call(self, y_true, y_pred):
y_batch = ....
return self.reducer(y_batch)
def get_config(self):
return {'input': 1}
Of course you don't need to write so complicated, but it should give you an idea of how to do it. Also, you can simply add sample_weights if you need.

Alternatives for loss functions in python CNTK

I have created a sequential model in CNTK and pass this model into a loss function like the following:
ce = cross_entropy_with_softmax(model, labels)
As mentioned here and as I have multilabel classifier, I want to use a proper loss function. The problem is I can not find any proper document to find these loss functions in Python. Is there any suggestion or sample code for this requirement.
I should notice that I found these alternatives (logistic and weighted logistic) in BrainScript language, but not in Python.

"my data has more than one label (three label) and each label has more than two values (30 different values)"
Do I understand right, you have 3 network outputs and associated labels, and each one is a 1-in-30 classifier? Then it seems you can just add three cross_entropy_with_softmax() values. Is that what you want?
E.g. if the model function returns a triple (ending in something like return combine([z1, z2, z3])), then your criterion function that you pass to Trainer could look like this (if you don't use Python 3, the syntax is a little different):
from cntk.layers.typing import Tensor, SparseTensor
#Function
def my_criterion(input : Tensor[input_dim], labels1 : SparseTensor[30],
labels2 : SparseTensor[30], labels3 : SparseTensor[30]):
z1, z2, z3 = my_model(input).outputs
loss = cross_entropy_with_softmax(z1, labels1) + \
cross_entropy_with_softmax(z2, labels2) + \
cross_entropy_with_softmax(z3, labels3)
return loss
learner = ...
trainer = Trainer(None, my_criterion, learner)
# in MB loop:
input_mb, L1_mb, L2_mb, L3_mb = my_next_minibatch()
trainer.train_minibatch(my_criterion.argument_map(input_mb, L1_mb, L2_mb, L3_mb))

Update (based on comments below): If you are using a sequential model then you are probably interested in taking a sum over all positions in the sequence of the loss at each position. cross_entropy_with_softmax is appropriate for the per-position loss and CNTK will automatically compute the sum of the loss values over all positions in the sequence.
Note that the terminology multilabel is non-standard here as it is typically referring to problems with multiple binary labels. The wiki page you link to refers to that case which is different from what you are doing.
Original answer (valid for the actual multilabel case): You will want to use binary_cross_entropy or weighted_binary_cross_entropy. (We decided to rename Logistic when porting this to Python). At the time of this writing these operations only support {0,1} labels. If your labels are in (0,1) then you will need to define your loss like this
import cntk as C
my_bce = label*C.log(model)+(1-label)*C.log(1-model)

Currently, most operators are in the cntk.ops package and documented here. The only exception being the sequence related operators, which reside in cntk.ops.sequence.
We have plans to restructure the operator space (without breaking backwards compatibility) to increase discoverability.
For your particular case, cross_entropy_with_softmax seems to be a reasonable choice, and you can find its documentation with examples here. Please also check out this Jupyter Notebook for a complete example.

What is the loss function that use the DNNRegressor?

I am using DNNRegressor to train my model. I search in the documentation what is the loss function used by this wrapper but i don't find it. On the other hand, it is possible to change that loss function?.
Thank you for your suggestions.

It uses L2 loss (mean squared error) as defined in target_column.py:
def regression_target(label_name=None,
weight_column_name=None,
target_dimension=1):
"""Creates a _TargetColumn for linear regression.
Args:
label_name: String, name of the key in label dict. Can be null if label
is a tensor (single headed models).
weight_column_name: A string defining feature column name representing
weights. It is used to down weight or boost examples during training. It
will be multiplied by the loss of the example.
target_dimension: dimension of the target for multilabels.
Returns:
An instance of _TargetColumn
"""
return _RegressionTargetColumn(loss_fn=_mean_squared_loss,
label_name=label_name,
weight_column_name=weight_column_name,
target_dimension=target_dimension)
and currently API does not support any changes here. However, since it is open source - you can always modify the constructor to call different function internally, with different loss.

How to create an optimizer in Tensorflow

I want to write a new optimization algorithm for my network on Tensorflow. I hope to implement the Levenberg Marquardt optimization algorithm, which now is excluded from TF API. I found poor documentation on how to write a custom optimizer, so i ask if someone can give my any advice. Thanks.

The simplest example of an optimizer is probably the gradient descent optimizer. It shows how one creates an instance of the basic optimizer class. The optimizer base class documentation explains what the methods do.
The python side of the optimizers adds new nodes to the graph that compute and apply the gradients being back-propagated. It supplies the parameters that get passed to the ops and does some of the high-level management of the optimizer. Then, you need the actual "Apply" op.
Ops have both a python and a C++ component. Writing a training op is the same (but specialized) as the general process of adding an Op to TensorFlow.
For an example set of training ops that compute and apply gradients, see
python/training/training_ops.py - this is the Python glue for the actual training ops. Note that the code here is mostly about shape inference - the computation is going to be in the C++.
The actual math for applying the gradients is handled by an Op (recalling that, in general, ops are written in C++). In this case, the apply gradients ops are defined in core/kernels/training_ops.cc. You can see, for example, the implementation of ApplyGradientDescentOp in there, which references a functor ApplyGradientDescent:
var.device(d) -= grad * lr();
The implementation of the Op itself follows the implementation of any other op as described in the adding-an-op docs.

Before running the Tensorflow Session, one should initiate an Optimizer as seen below:
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
tf.train.GradientDescentOptimizer is an object of the class GradientDescentOptimizer and as the name says, it implements the gradient descent algorithm.
The method minimize() is being called with a “cost” as parameter and consists of the two methods compute_gradients() and then apply_gradients().
For most (custom) optimizer implementations, the method apply_gradients() needs to be adapted.
This method relies on the (new) Optimizer (class), which we will create, to implement the following methods: _create_slots(), _prepare(), _apply_dense(), and _apply_sparse().
_create_slots() and _prepare() create and initialise additional
variables, such as momentum.
_apply_dense(), and _apply_sparse() implement the actual Ops, which update the variables.
Ops are generally written in C++ . Without having to change the C++ header yourself, you can still return a python wrapper of some Ops through these methods.
This is done as follows:
def _create_slots(self, var_list):
# Create slots for allocation and later management of additional
# variables associated with the variables to train.
# for example: the first and second moments.
'''
for v in var_list:
self._zeros_slot(v, "m", self._name)
self._zeros_slot(v, "v", self._name)
'''
def _apply_dense(self, grad, var):
#define your favourite variable update
# for example:
'''
# Here we apply gradient descents by substracting the variables
# with the gradient times the learning_rate (defined in __init__)
var_update = state_ops.assign_sub(var, self.learning_rate * grad)
'''
#The trick is now to pass the Ops in the control_flow_ops and
# eventually groups any particular computation of the slots your
# wish to keep track of:
# for example:
'''
m_t = ...m... #do something with m and grad
v_t = ...v... # do something with v and grad
'''
return control_flow_ops.group(*[var_update, m_t, v_t])
For a more detailed explanation with example, see this blog post
https://www.bigdatarepublic.nl/custom-optimizer-in-tensorflow/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

When should tf.losses.add_loss() be used in TensorFlow? - python

Related

optimizer.step() Not updating Model Weights/Parameters

Custom reduction of losses within each batch in Keras

Alternatives for loss functions in python CNTK

What is the loss function that use the DNNRegressor?

How to create an optimizer in Tensorflow

Categories

Resources