Use generator in loss function - python

I need to incorporate additional information into a Keras loss function that depends on the current batch. Since Keras losses only take two arguments, I considered adding this information by making the loss function call next() on a generator object. However, the generator is only called once (probably when adding the loss function in model.compile()).
Here is a sample code:
data_batches = yield_data_batches()
meta_batches = yield_meta_batches()
....
model.compile(loss=loss_function, ...)
model.fit_generator(generator=data_batches, ....)
def loss_function(x, y):
meta_x, meta_y = next(meta_batches)
x *= meta_x # component-wise matrix multiplication
y *= meta_y # component-wise matrix multiplication
return mse(x, y)
Is there a way to make the loss function get a new meta_batch each time it is evaluated on a data_batch? Or is there another way to incorporate this meta information into the loss function?
Clarification:
The meta_x and meta_y are binary matrices that should cancel out certain elements from the prediction as they should not count to the loss.
For example:
y_true = (a,b,c,0)
y_pred = (d,e,f,g)
y_meta = (1,1,1,0)
Now, y_pred*y_meta should cancel out g so that it does not count to the loss.

This does not work, since the loss function will be compiled and added to the compute graph. Your loss function may only depend on y_pred and y_true.
You can either incorporate this information in y_true, or weight the resulting loss with sample weights.
Your approach would be equivalent to a combination of both:
Assuming positive weights a and b (you call them meta_x and meta_y):
|ax-by| = a|x-b/a*y|, so you just weight y_pred by b/a and add the sample weight a.

Related

How to randomly set inputs to zero in keras during training autoencoder (callback)?

I am training 2 autoencoders with 2 separate input paths jointly and I would like to randomly set one of the input paths to zero.
I use tensorflow with keras backend (functional API).
I am computing a joint loss (sum of two losses) for backpropagation.
A -> A' & B ->B'
loss => l2(A,A')+l2(B,B')
networks taking A and B are connected in latent space.
I would like to randomly set A or B to zero and compute the loss only on the corresponding path, meaning if input path A is set to zero loss be computed only by using outputs of only path B and vice versa; e.g.:
0 -> A' & B ->B'
loss: l2(B,B')
How do I randomly set input path to zero? How do I write a callback which does this?
Maybe try the following:
import random
def decision(probability):
return random.random() < probability
Define a method that makes a random decision based on a certain probability x and make your loss calculation depend on this decision.
if current_epoch == random.choice(epochs):
keep_mask = tf.ones_like(A.input, dtype=float32)
throw_mask = tf.zeros_like(A.input, dtype=float32)
if decision(probability=0.5):
total_loss = tf.reduce_sum(reconstruction_loss_a * keep_mask
+ reconstruction_loss_b * throw_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a * throw_mask
+ reconstruction_loss_b * keep_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a + reconstruction_loss_b)
I assume that you do not want to set one of the paths to zero every time you update your model parameters, as then there is a risk that one or even both models will not be sufficiently trained. Also note that I use the input of A to create zero_like and one_like tensors as I assume that both inputs have the same shape; if this is not the case, it can easily be adjusted.
Depending on what your goal is, you may also consider replacing your input of A or B with a random tensor e.g. tf.random.normal based on a random decision. This creates noise in your model, which may be desirable, as your model would be forced to look into the latent space to try reconstruct your original input. This means precisely that you still calculate your reconstruction loss with A.input and A.output, but in reality your model never received the A.input, but rather the random tensor.
Note that this answer serves as a simple conceptual example. A working example with Tensorflow can be found here.
You can set an input to 0 simply:
A = A*random.choice([0,1])
This code can be used inside a loss function

Custom loss function in tensorflow using list as penalty

I am new to tensorflow and have problems defining a custom loss function for a customer churn problem, which includes a list of values as penalty.
So far, I replicated a mean squared error function that penalizes wrong predictions with an integer.
def rfm_penalty(y_true, y_pred):
penalty = # Integer value, to be replaced by list
loss = tf.where(tf.less(y_true * y_pred, 0),
penalty * tf.square(y_true - y_pred), # penalize negat. (wrong) preds
tf.square(y_true - y_pred)) # no penalty for pos. preds
return tf.reduce_mean(loss, axis=-1)
This one works, but I'd like to modify it: Previously, I calculated a metric that measures the value of a customer, called RFM (for recency, frequency and monetary value of past purchases). This metric is an integer value from 3 to 12 that sums up the three metrics R, F and M. It is stored in a feature column of my df_train.
df_train['RFM_Score'] = [3,6,5,9,11,12,4,...,4] # same dimensions as y_true
I would like to use this feature column (or list) as penalty, thus penalizing wrong predictions higher for highly valuable customers. I would be happy for every idea to do that, even better with a sigmoid function, as its a binary classification case.
Thank you!

Incorporate side conditions into Keras neural network

I want to train my neural network (in Keras) with an additional condition on the output elements.
An example:
Minimize my loss function MSE between network output y_pred and y_true.
Additionally, ensure that the norm of y_pred is less or equal 1.
Without the condition, the task is straightforward.
Note: The condition is not necessarily the vector norm of y_pred.
How can I implement the additional condition/restriction in a Keras (or maybe Tensorflow) model?
In principle, tensorflow (and keras) don't allow you to add hard constraints to your model.
You have to convert your invarient (norm <= 1) to a penalty function, which is added to the loss. This could look like this:
y_norm = tf.norm(y_pred)
norm_loss = tf.where(y_norm > 1, y_norm, 0)
total_loss = mse + norm_loss
Look at the docs of where. If your prediction has a norm bigger than one, backpropagation tries to minimize the norm. If it is less than or equal, this part of the loss is simply 0. No gradient is produced.
But this can be very hard to optimize. Your predictions could oscillate around a norm of 1. It is also possible to add a factor: total_loss = mse + 1000* norm_loss. Be very careful with this, it makes optimization even harder.
In the example above, the norm above one contributes linearly to the loss. This is called l1-regularization. You could also square it, which would become l2-regularization.
In your specific case, you could get creative. Why not normalize your predictions and the targets to one (just a suggestion, might be a bad idea)?
loss = mse(y_pred / tf.norm(y_pred), y_target / np.linalg.norm(y_target)

Compute gradients for each time step of tf.while_loop

Given a TensorFlow tf.while_loop, how can I calculate the gradient of x_out with respect to all weights of the network for each time step?
network_input = tf.placeholder(tf.float32, [None])
steps = tf.constant(0.0)
weight_0 = tf.Variable(1.0)
layer_1 = network_input * weight_0
def condition(steps, x):
return steps <= 5
def loop(steps, x_in):
weight_1 = tf.Variable(1.0)
x_out = x_in * weight_1
steps += 1
return [steps, x_out]
_, x_final = tf.while_loop(
condition,
loop,
[steps, layer_1]
)
Some notes
In my network the condition is dynamic. Different runs are going to run the while loop a different amount of times.
Calling tf.gradients(x, tf.trainable_variables()) crashes with AttributeError: 'WhileContext' object has no attribute 'pred'. It seems like the only possibility to use tf.gradients within the loop is to calculate the gradient with respect to weight_1 and the current value of x_in / time step only without backpropagating through time.
In each time step, the network is going to output a probability distribution over actions. The gradients are then needed for a policy gradient implementation.
You can't ever call tf.gradients inside tf.while_loop in Tensorflow based on this and this, I found this out the hard way when I was trying to create conjugate gradient descent entirely into the Tensorflow graph.
But if I understand your model correctly, you could make your own version of an RNNCell and wrap it in a tf.dynamic_rnn, but the actual cell
implementation will be a little complex since you need to evaluate a condition dynamically at runtime.
For starters, you can take a look at Tensorflow's dynamic_rnn code here.
Alternatively, dynamic graphs have never been Tensorflow's strong suite, so consider using other frameworks like PyTorch or you can try out eager_execution and see if that helps.

Writing a custom loss function without y_true in Keras

I'm implementing a triplet loss function in Keras. In general, loss functions take predicted values with ground truth as arguments. But triplet loss doesn't use labels, just the output. I tried to write the function with just one parameter:
def triplet_loss(y_pred):
margin = 1
return K.mean(K.square(y_pred[0]) - K.square(y_pred[1]) + margin)
It failed saying triplet_loss() takes 1 argument but two arguments were given (in score_array = fn(y_true, y_pred). When I write the function with two arguments y_true, y_pred, the program runs without error. Why is that? Should I just implement this function with these two arguments although y_true won't be used? Is this correct or is there another way of doing it?
Well.... simply don't use the ground truth:
def triplet_loss(y_true,y_pred):
#all your code as it is.
It's not very usual to have networks trained without ground truth. When we expect it to learn something, there is very very often a ground truth. If you don't, simply ignore it.
Also, if y_true is ignored, what are you passing to the fit method? Just a dummy array?
implement it via K.function method.
output_tensor = your_model(input_tensor)
total_loss = K.mean( K.abs (input_tensor - output_tensor) )
nn_train = K.function ([input_tensor],[total_loss],
Adam(lr=5e-5, beta_1=0.5, beta_2=0.999).get_updates(total_loss, your_model.trainable_weights)
loss, = nn_train ([input])

Categories

Resources