Custom reduction of losses within each batch in Keras

Custom reduction of losses within each batch in Keras - python

I am using keras for tensorflow in Python. I have a custom loss function that returns a single number for each sample in a batch (so a vector with length = batch size). How can I also specify a custom reduction method to aggregate these sample losses into a single loss for the entire batch? Is it acceptable to include this reduction within the custom loss function and have this function return just a single scalar rather than a vector of losses?

It really depends on your application and goal. A very common approach is to perform a reduce_mean over the loss generated on batch size. Some also use reduce_sum, which of course makes the loss value to depend on the batch size. A general (and maybe unnecessarily complicated) approach could be to use a function to call your desired function, which reduces the batch loss to a single value. Let's call it reducer. In your loss function, in the last line, you can call it right before return:
class my_loss(keras.losses.Loss):
def __init__(self, inputs)
# a bunch of assignments
self.reducer = self._get_reducer_function(inputs) (or a normal mean function)
def call(self, y_true, y_pred):
y_batch = ....
return self.reducer(y_batch)
def get_config(self):
return {'input': 1}
Of course you don't need to write so complicated, but it should give you an idea of how to do it. Also, you can simply add sample_weights if you need.

Related

Applying non-torch function on loss before calling backward()?

I want to apply a custom non-torch function on the final calculated loss before computing the gradients (calling backward()). An example would be to replace the torch.mean() on the loss vector with a custom pythonic, non-torch mean function. But doing so will break the computation graph. I can not rewrite the custom mean function using torch operators and I am at a loss as how to do this. Any suggestions?

In pytorch you can easily do this by inheriting from torch.autograd.Function: All you need to do is implement your custom forward() and the corresponding backward() methods. Because I don't know the function you intend to write, I'll demonstrate it by implementing the sine function in a way that works with the automatic differentiation. Note that you need to have a method to compute the derivative of your function with respect to its input to implement the backward pass.
import torch
class MySin(torch.autograd.Function):
#staticmethod
def forward(ctx, inp):
""" compute forward pass of custom function """
ctx.save_for_backward(inp) # save activation for backward pass
return inp.sin() # compute forward pass, can also be computed by any other library
#staticmethod
def backward(ctx, grad_out):
""" compute product of output gradient with the
jacobian of your function evaluated at input """
inp, = ctx.saved_tensors
grad_inp = grad_out * torch.cos(inp) # propagate gradient, can also be computed by any other library
return grad_inp
To use it you can use the function sin = MySin.apply on your input.
There is also another example worked out in the documentation.

Memory leak for custom tensorflow training using #tf.function

I am trying to write my own training loop for TF2/Keras, following the official Keras walkthrough. The vanilla version works like a charm, but when I try to add the #tf.function decorator to my training step, some memory leak grabs all my memory and I lose control of my machine, does anyone know what is going on?.
The important parts of the code look like this:
#tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = siamese_network(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, siamese_network.trainable_weights)
optimizer.apply_gradients(zip(grads, siamese_network.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
#tf.function
def test_step(x, y):
val_logits = siamese_network(x, training=False)
val_acc_metric.update_state(y, val_logits)
val_prec_metric.update_state(y_batch_val, val_logits)
val_rec_metric.update_state(y_batch_val, val_logits)
for epoch in range(epochs):
step_time = 0
epoch_time = time.time()
print("Start of {} epoch".format(epoch))
for step, (x_batch_train, y_batch_train) in enumerate(train_ds):
if step > steps_epoch:
break
loss_value = train_step(x_batch_train, y_batch_train)
train_acc = train_acc_metric.result()
train_acc_metric.reset_states()
for val_step,(x_batch_val, y_batch_val) in enumerate(test_ds):
if val_step>validation_steps:
break
test_step(x_batch_val, y_batch_val)
val_acc = val_acc_metric.result()
val_prec = val_prec_metric.result()
val_rec = val_rec_metric.result()
val_acc_metric.reset_states()
val_prec_metric.reset_states()
val_rec_metric.reset_states()
If I comment on the #tf.function lines, the memory leak doesn't occur, but the step time is 3 times slower. My guess is that somehow the graph is bean created again within each epoch or something like that, but I have no idea how to solve it.
This is the tutorial I am following: https://keras.io/guides/writing_a_training_loop_from_scratch/

tl;dr;
TensorFlow may be generating a new graph for each unique set of argument values passed into the decorated functions. Make sure you are passing consistently-shaped Tensor objects to test_step and train_step instead of python objects.
Details
This is a stab in the dark. While I've never tried #tf.function, I did find the following warnings in the documentation:
tf.function also treats any pure Python value as opaque objects, and builds a separate graph for each set of Python arguments that it encounters.
and
Caution: Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible
Finally:
A Function determines whether to reuse a traced ConcreteFunction by computing a cache key from an input's args and kwargs. A cache key is a key that identifies a ConcreteFunction based on the input args and kwargs of the Function call, according to the following rules (which may change): The key generated for a tf.Tensor is its shape and dtype. The key generated for a tf.Variable is a unique variable id. The key generated for a Python primitive (like int, float, str) is its value. The key generated for nested dicts, lists, tuples, namedtuples, and attrs is the flattened tuple of leaf-keys (see nest.flatten). (As a result of this flattening, calling a concrete function with a different nesting structure than the one used during tracing will result in a TypeError). For all other Python types the key is unique to the object. This way a function or method is traced independently for each instance it is called with.
What I get from all this is that if you don't pass in a consistently-sized Tensor object to your #tf.function-ified function (perhaps you use Python collections or primitives instead), it is likely that you are creating a new graph version of your function with every distinct argument value you pass in. I'm guessing this could create the memory explosion behavior you're seeing. I can't tell how your test_ds and train_ds objects are being created, but you might want to make sure that they are created such that enumerate(blah_ds) returns tensors like in the tutorial, or at least convert the values to tensors before passing to your test_step and train_step functions.

Setting up an optimization solver on top of a neural network model

I have a trained neural network model developed using the Keras framework in a Jupyter notebook. It is a regression problem, where I am trying to predict an output variable using some 14 input variables or features.
As a next step, I would like to minimize my output and want to determine what configuration/values these 14 inputs would take to get to the minimal value of the output.
So, essentially, I would like to pass the trained model object as my objective function in a solver, and also a bunch of constraints on the input variables to optimize/minimize the objective.
What is the best Python solver that can help me get there?
Thanks in advance!

So you already have your trained model, which we can think of as f(x) = y.
The standard SciPy method to minimize this is appropriately named scipy.optimize.minimize.
To use it, you just need to adapt your f(x) = y function to fit the API that SciPy uses. That is, the first function argument is the list of params to optimize over. The second argument is optional, and can contain any args that are fixed for the entire optimization (i.e. your trained model).
def score_trained_model(params, args):
# Get the model from the fixed args.
model = args[0]
# Run the model on the params, return the output.
return model_predict(model, params)
With this, plus an initial guess, you can use the minimize function now:
# Nelder-Mead is my go-to to start with.
# But it doesn't take advantage of the gradient.
# Something that does, e.g. BGFS, may perform better for your case.
method = 'Nelder-Mead'
# All zeros is fine, but improving this initial guess can help.
guess_params = [0]*14
# Given a trained model, optimize the inputs to minimize the output.
optim_params = scipy.optimize.minimize(
score_trained_model,
guess_params,
args=(trained_model,),
method=method,
)
It is possible to supply constraints and bounds to some of the optimization methods. For Nelder-Mead that is not supported, but you can just return a very large error when constraints are violated.
Older answer.
OP wants to optimize the inputs, x, not the hyperparameters.
It sounds like you want to do hyperparameter optimization. My Python library of choice is hyperopt: https://github.com/hyperopt/hyperopt
Given that you already have some training and scoring code, for example:
def train_and_score(args):
# Unpack args and train your model.
model = make_model(**args)
trained = train_model(model, **args)
# Return the output you want to minimize.
return score_model(trained)
You can easily use hyperopt to tune parameters like the learning rate, dropout, or choice of activations:
from hyperopt import fmin, hp, tpe, space_eval
space = {
'lr': hp.loguniform('lr', np.log(0.01), np.log(0.5)),
'dropout': hp.uniform('dropout', 0, 1),
'activation': hp.choice('activation', ['relu', 'sigmoid']),
}
# Minimize the training score over the space.
trials = Trials()
best = fmin(train_and_score, space, trials=trials, algo=tpe.suggest, max_evals=100)
# Print details about the best results and hyperparameters.
print(best)
print(space_eval(space, best))
There are also libraries that will help you directly integrate this with Keras. A popular choice is hyperas: https://github.com/maxpumperla/hyperas

Tensorflow: What's the difference between the use of tf.mat_fn() or tf.nn.dynamic_rnn() to apply layers before an LSTM?

This question is about coding strategy using Tensorflow. I would like to create a small classifier network made of:
1: an input
2: a simple layer fully connected (W*x+B)
3: a LSTM layer
4: a softmax layer
5: an ouput
In tensorflow, to use the class tf.nn.dynamic_rnn(), we need to a batch of sequences to the network. So far, it's work perfectly (I love this library).
But as I want to apply a simple layer on each features of my sequences (2nd layer in my description), i'm wondering:
Do i preceed my LSTM layer with this simple layer and pass both to the tf.nn.dynamic_rnn() operation...
OR
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
Normally, it should give me the same result ? If it's the case, what should I use ?
Thank you for your time !

I recently encountered a similar scenario, where I'd like to chain recurrent and non-recurrent layers.
Do i preceed my LSTM layer with this simple layer and pass both to the
tf.nn.dynamic_rnn() operation...
This won't work. The function dynamic_rnn expects a cell as its first argument. A cell is a class that inherits from tf.nn.rnn_cell.RNNCell. Additionally, the second input argument to dynamic_rnn should be a tensor with at least 3 dimensions, where the first two dimensions are batch and time (time_major=False) or time and batch (time_major=True).
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
This might work, but doesn't appear to me to be an efficient and clean solution. Firstly, it should not be necessary to 'unpack batches', as you presumably want to perform some operation on batches of features and time-steps, where each observation in a batch is independent from the others.
My solution to this particular problem was to create a sub-class of tf.nn.rnn_cell.RNNCell. In my case I wanted a simple feedforward layer that would iterate over all of the time steps and that could be used in dynamic_rnn:
import tensorflow as tf
class FeedforwardCell(tf.nn.rnn_cell.RNNCell):
"""A stateless feedforward cell that can be used with MultiRNNCell
"""
def __init__(self, num_units, activation=tf.tanh, dtype=tf.float32):
self._num_units = num_units
self._activation = activation
# Store a dummy state to make dynamic_rnn happy.
self.dummy = tf.constant([[0.0]], dtype=dtype)
#property
def state_size(self):
return 1
#property
def output_size(self):
return self._num_units
def zero_state(self, batch_size, dtype):
return self.dummy
def __call__(self, inputs, state, scope=None):
"""Basic feedforward: output = activation(W * input)."""
with tf.variable_scope(scope or type(self).__name__): # "FeedforwardCell"
output = self._activation(tf.nn.rnn_cell._linear(
[inputs], self._num_units, True))
return output, self.dummy
An instance of this class can be passed, in a list with "normal" RNN cells, to an tf.nn.rnn_cell.MultiRNNCell initializer. The resulting object instance can be passed as the cell input argument to dynamic_rnn.
Important to note: dynamic_rnn expects that a recurrent cell returns a state when called. I therefore use dummy in FeedforwardCell as a fake state variable.
My solution might not be the smoothest or best way to chain recurrent and non-recurrent layers together. I'd be interested in hearing from other Tensorflow users about their suggestions.
Edit
If you choose to use the sequence_length input argument of dynamic_rnn, then state_size should be self._num_units and the dummy state should have shape [batch_size, self.state_size]. In other words, the state cannot be a scalar. Note that bidirectional_dynamic_rnn requires that the sequence_length argument is not None, whereas dynamic_rnn does not have this requirement. (This is weakly documented in the TF documentation.)

What is the loss function that use the DNNRegressor?

I am using DNNRegressor to train my model. I search in the documentation what is the loss function used by this wrapper but i don't find it. On the other hand, it is possible to change that loss function?.
Thank you for your suggestions.

It uses L2 loss (mean squared error) as defined in target_column.py:
def regression_target(label_name=None,
weight_column_name=None,
target_dimension=1):
"""Creates a _TargetColumn for linear regression.
Args:
label_name: String, name of the key in label dict. Can be null if label
is a tensor (single headed models).
weight_column_name: A string defining feature column name representing
weights. It is used to down weight or boost examples during training. It
will be multiplied by the loss of the example.
target_dimension: dimension of the target for multilabels.
Returns:
An instance of _TargetColumn
"""
return _RegressionTargetColumn(loss_fn=_mean_squared_loss,
label_name=label_name,
weight_column_name=weight_column_name,
target_dimension=target_dimension)
and currently API does not support any changes here. However, since it is open source - you can always modify the constructor to call different function internally, with different loss.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Custom reduction of losses within each batch in Keras - python

Related

Applying non-torch function on loss before calling backward()?

Memory leak for custom tensorflow training using #tf.function

Setting up an optimization solver on top of a neural network model

Tensorflow: What's the difference between the use of tf.mat_fn() or tf.nn.dynamic_rnn() to apply layers before an LSTM?

What is the loss function that use the DNNRegressor?

Categories

Resources