So my question is, if I have something like:
model = Model(inputs = input, outputs = [y1,y2])
model.compile(loss = my_loss ...)
I have only seen my_loss as a dictionary of independent losses and, then, the final loss is defined as the sum of those. But, can I define in a multitask model a loss function that take all the predicted/true values and then I can multiply them (for instance)?
This is the loss I am trying to define:
def my_loss(y_true1, y_true2, y_pred1, y_pred2):
final_loss = binary_crossentropy(y_true1, y_pred1) + y_true1 * categorical_crossentropy(y_true2, y_pred2)
return final_loss
Usually, your paramaters are y_true, y_pred in the loss function, where y_pred is either y1 or y2. But now I need both to compute the loss, so how can I define this loss function and pass all the parameters to the function: y_true1, y_true2, y_pred1, y_pred2.
My current model that I want to change its loss:
x = Input(shape=(n, ))
shared = Dense(32)(x)
sub1 = Dense(16)(shared)
sub2 = Dense(16)(shared)
y1 = Dense(1)(sub1, activation='sigmoid')
y2 = Dense(4)(sub2, activation='softmax')
model = Model(inputs = input, outputs = [y1,y2])
model.compile(loss = ['binary_crossentropy', 'categorical_crossentropy'] ...) #THIS LINE I WANT TO CHANGE IT
Thanks!
I'm not sure if I'm understanding correctly, but I'll try.
The loss function must contain both the predicted and the actual data -- it's a way to measure the error between what your model is predicting and the true data. However, the predicted and actual data do not need to be one-dimensional. You can make y_pred a tensor that contains both y_pred1 and y_pred2. Likewise, y_true can be a tensor that contains both y_true1 and y_true2.
As far as I know, loss functions should return a single number. That's why loss functions often have a mean or a sum to add up all of the losses for individual data points.
Here's an example of mean square error that will work for more than 1D:
import keras.backend as K
def my_loss(y_true, y_pred):
# this example is mean squared error
# works if if y_pred and y_true are greater than 1D
return K.mean(K.square(y_pred - y_true))
Here's another example of a loss function that I think is closer to your question (although I cannot comment on whether or not it's a good loss function):
def my_loss(y_true, y_pred):
# calculate mean(abs(y_pred1*y_pred2 - y_true1*ytrue2))
# this will work for 2D inputs of y_pred and y_true
return K.mean(K.abs(K.prod(y_pred, axis = 1) - K.prod(y_true, axis = 1)))
Update:
You can concatenate two outputs into a single tensor with keras.layers.Concatenate. That way you can still have a loss function with only two arguments.
In the model you wrote above, the y1 output shape is (None, 1) and the y2 output shape is (None, 4). Here's an example of how you could write your model so that the output is a single tensor that concatenates y1 and y1 into a shape of (None, 5):
from keras import Model
from keras.layers import Input, Dense
from keras.layers import Concatenate
input_layer = Input(shape=(n, ))
shared = Dense(32)(input_layer)
sub1 = Dense(16)(shared)
sub2 = Dense(16)(shared)
y1 = Dense(1, activation='sigmoid')(sub1)
y2 = Dense(4, activation='softmax')(sub2)
mergedOutput = Concatenate()([y1, y2])
Below, I show an example for how you could rewrite your loss function. I wasn't sure which of the 5 columns of the output to call y_true1 vs. y_true2, so I guessed that y_true1 was column 1 and y_true2 was the remaining 4 columns. The same column structure would apply to y_pred1 and y_pred2.
from keras import losses
def my_loss(y_true, y_pred):
final_loss = (losses.binary_crossentropy(y_true[:, 0], y_pred[:, 0]) +
y_true[:, 0] *
losses.categorical_crossentropy(y_true[:, 1:], y_pred[:,1:]))
return final_loss
Finally, you can compile the model without any major changes from normal:
model.compile(optimizer='adam', loss=my_loss)
Related
I'm having trouble implementing a custom loss function into a Neural Network I'm building in TensorFlow. I want use one of my features as part of the loss function, so I've tried using model.add_loss instead of giving loss a value in the model.compile function.
My data looks like this:
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
feature_df = np.array([600,9])
training, test, = feature_df[:350,:], feature_df[350:,:]
x_train = training[:,[0,1,2,3,4,5,6]]
y_train = training[:,8]
loss_inp_train = training[:,[6]]
x_test = test[:,[0,1,2,3,4,5,6]]
y_test = test[:,8]
loss_inp_test = test[:,[6]]
I want to use a custom loss function because its not necessarily the mse I'm interested in minimizing, I want to minimize the profitability of this model, which depends if y_true and y_pred fall above or below loss_inp_train
I've tried creating a loss function that looks like this
def custom_loss(y_pred, y_true,inp):
loss = 0
if (y_pred < inp):
if y_true < inp:
loss = loss + .9
else:
loss = loss - 1
else:
if y_true > inp:
loss = loss + .9
else:
loss = loss - 1
loss = loss*-1
return(loss)
And the Model
model = tf.keras.Sequential([
normalize,
layers.Dense(18),
layers.Dense(1)
])
model.add_loss(profit_loss(y_pred,y_train,loss_inp_train))
model.compile(loss = None,
optimizer = tf.optimizers.Adam())
I'm having trouble feeding the loss function the output of the model. I'm still new to TensorFlow, whenever I've accessed predicted values its after the training using model.predict, but obviously I don't have a fitted model yet. How do I reference both a feature of the training data and y_true, y_pred in a function?
Probably the best way to do this is to define a custom loss. Unfortunately I'm not sure how to handle nested if statements like you have. Probably with a combination of K.switch. I can try to give you a partial solutions taking in consideration only the presence of a single if statement. Let's take the following simplified code:
loss = 0
if (y_pred < inp):
loss = # assignment 1
else:
loss = # assignment 2
In this case the loss function could be converted into this:
def profit_loss(inp):
def loss_function(y_true, y_pred):
loss = 0
condition = K.greater(y_pred - inp, 0)
loss1 = # assignment 1 if y_pred < inp
loss2 = # assignment 2 if y_pred >= inp
loss = K.switch(condition, loss2, loss1)
return - K.sum(loss)
return loss_function
model.compile(optimizer = tf.optimizers.Adam(), loss=profit_loss(inp))
This way y_true and y_pred are automatically handled and you just have to feed the inp argument.
Hope this helps getting you closer to solving the problem.
i am using tensorflow/keras and i would like to use the input in the loss function
as per this answer here
Custom loss function in Keras based on the input data
I have created my loss function thusly
def custom_Loss_with_input(inp_1):
def loss(y_true, y_pred):
b = K.mean(inp_1)
return y_true - b
return loss
and set up the model with the layers and all ending like this
model = Model(inp_1, x)
model.compile(loss=custom_Loss_with_input(inp_1), optimizer= Ada)
return model
Nevertheless, i get the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
Any advice on how to eliminate this error?
Thanks in advance
You can use add_loss to pass external layers to your loss, in your case the input tensor.
Here an example:
def CustomLoss(y_true, y_pred, input_tensor):
b = K.mean(input_tensor)
return K.mean(K.square(y_true - y_pred)) + b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss( CustomLoss( target, out, inp ) )
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
If your loss is composed of different parts and you want to track them you can add different losses corresponding to the loss parts. In this way, the losses are printed at the end of each epoch and are stored in model.history.history. Remember that the final loss minimized during training is the sum of the various loss parts.
def ALoss(y_true, y_pred):
return K.mean(K.square(y_true - y_pred))
def BLoss(input_tensor):
b = K.mean(input_tensor)
return b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss(ALoss( target, out ))
model.add_metric(ALoss( target, out ), name='a_loss')
model.add_loss(BLoss( inp ))
model.add_metric(BLoss( inp ), name='b_loss')
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
To use the model in inference mode (removing the target from inputs):
final_model = Model(model.input[0], model.output)
final_model.predict(X)
I'm trying to reproduce the architecture of the network proposed in this publication in tensorFlow. Being a total beginner to this, I've been using this tutorial as a base to work on, using tensorflow==2.3.2.
To train this network, they use a loss which implies outputs from two branches of the network at the same time, which made me look towards custom losses function in keras. I've got that you can define your own, as long as the definition of the function looks like the following:
def custom_loss(y_true, y_pred):
I also understood that you could give other arguments like so:
def loss_function(margin=0.3):
def custom_loss(y_true, y_pred):
# And now you can use margin
You then just have to call these while compiling your model. When it comes to using multiple outputs, the most common approach seem to be the one proposed here, where you would give several losses functions, one being called for each of your output.
However, I could not find a solution to give several outputs to a loss function, which is what I need here.
To further explain it, here is a minimal working example showing what I've tried, which you can try for yourself in this collab.
import os
import tensorflow as tf
import keras.backend as K
from tensorflow.keras import datasets, layers, models, applications, losses
from tensorflow.keras.preprocessing import image_dataset_from_directory
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')
train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')
BATCH_SIZE = 32
IMG_SIZE = (160, 160)
IMG_SHAPE = IMG_SIZE + (3,)
train_dataset = image_dataset_from_directory(train_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)
validation_dataset = image_dataset_from_directory(validation_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)
data_augmentation = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip('horizontal'),
layers.experimental.preprocessing.RandomRotation(0.2),
])
preprocess_input = applications.resnet50.preprocess_input
base_model = applications.ResNet50(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = True
conv = layers.Conv2D(filters=128, kernel_size=(1,1))
global_pooling = layers.GlobalAveragePooling2D()
horizontal_pooling = layers.AveragePooling2D(pool_size=(1, 5))
reshape = layers.Reshape((-1, 128))
def custom_loss(y_true, y_pred):
print(y_pred.shape)
# Do some stuffs involving both outputs
# Returning something trivial here for correct behavior
return K.mean(y_pred)
inputs = tf.keras.Input(shape=IMG_SHAPE)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=True)
first_branch = global_pooling(x)
second_branch = conv(x)
second_branch = horizontal_pooling(second_branch)
second_branch = reshape(second_branch)
model = tf.keras.Model(inputs, [first_branch, second_branch])
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
loss=custom_loss,
metrics=['accuracy'])
model.summary()
initial_epochs = 10
history = model.fit(train_dataset,
epochs=initial_epochs,
validation_data=validation_dataset)
while doing so, I thought that the y_pred given to loss function would be a list, containing both outputs. However, while running it, what I've got in stdout was this:
Epoch 1/10
(None, 2048)
(None, 5, 128)
What I understand from this is that the loss function is called with every output, one by one, instead of being called once with all the outputs, which means I can't define a loss that would use both the outputs at the same time. Is there any way to achieve this?
Please let me know if I'm unclear, or if you need further details.
I had the same problem trying to implement Triplet_Loss function.
I refered to Keras's implementation for Siamese Network with Triplet Loss Function but something didnt work out and I had to implement the network by myself.
def get_siamese_model(input_shape, conv2d_filters):
# Define the tensors for the input images
anchor_input = Input(input_shape, name="Anchor_Input")
positive_input = Input(input_shape, name="Positive_Input")
negative_input = Input(input_shape, name="Negative_Input")
body = build_body(input_shape, conv2d_filters)
# Generate the feature vectors for the images
encoded_a = body(anchor_input)
encoded_p = body(positive_input)
encoded_n = body(negative_input)
distance = DistanceLayer()(encoded_a, encoded_p, encoded_n)
# Connect the inputs with the outputs
siamese_net = Model(inputs=[anchor_input, positive_input, negative_input],
outputs=distance)
return siamese_net
and the "bug" was in DistanceLayer Implementation Keras posted (also in the same link above).
class DistanceLayer(tf.keras.layers.Layer):
"""
This layer is responsible for computing the distance between the anchor
embedding and the positive embedding, and the anchor embedding and the
negative embedding.
"""
def __init__(self, **kwargs):
super().__init__(**kwargs)
def call(self, anchor, positive, negative):
ap_distance = tf.math.reduce_sum(tf.math.square(anchor - positive), axis=1, keepdims=True, name='ap_distance')
an_distance = tf.math.reduce_sum(tf.math.square(anchor - negative), axis=1, keepdims=True, name='an_distance')
return (ap_distance, an_distance)
When I was training the model, the loss function took only one of the vectors ap_distance or an_distance.
FINALLY, THE FIX WAS to concatenate the vectors together (along axis=1 this case) and on the loss function, take them apart:
def call(self, anchor, positive, negative):
ap_distance = tf.math.reduce_sum(tf.math.square(anchor - positive), axis=1, keepdims=True, name='ap_distance')
an_distance = tf.math.reduce_sum(tf.math.square(anchor - negative), axis=1, keepdims=True, name='an_distance')
return tf.concat([ap_distance, an_distance], axis=1)
on my custom loss:
def get_loss(margin=1.0):
def triplet_loss(y_true, y_pred):
# The output of the network is NOT A tuple, but a matrix shape (batch_size, 2),
# containing the distances between the anchor and the positive example,
# and the anchor and the negative example.
ap_distance = y_pred[:, 0]
an_distance = y_pred[:, 1]
# Computing the Triplet Loss by subtracting both distances and
# making sure we don't get a negative value.
loss = tf.math.maximum(ap_distance - an_distance + margin, 0.0)
# tf.print("\n", ap_distance, an_distance)
# tf.print(f"\n{loss}\n")
return loss
return triplet_loss
Ok, here is an easy way to achieve this. We can achieve this by using the loss_weights parameter. We can weigh multiple outputs exactly the same so that we can get the combined loss results. So, for two output we can do
loss_weights = 1*output1 + 1*output2
In your case, your network has two outputs, by the name they are reshape, and global_average_pooling2d. You can do now as follows
# calculation of loss for one output, i.e. reshape
def reshape_loss(y_true, y_pred):
# do some math with these two
return K.mean(y_pred)
# calculation of loss for another output, i.e. global_average_pooling2d
def gap_loss(y_true, y_pred):
# do some math with these two
return K.mean(y_pred)
And while compiling now you need to do as this
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
loss = {
'reshape':reshape_loss,
'global_average_pooling2d':gap_loss
},
loss_weights = {
'reshape':1.,
'global_average_pooling2d':1.
}
)
Now, the loss is the result of 1.*reshape + 1.*global_average_pooling2d.
Using examples from Lipton et al (2016), target replication is basically calculating the loss at each time step (except final) of the LSTM (or GRU) and averaging this loss and adding it to the main loss while training. Mathematically, it is given by -
Graphically, it can be represented as -
So how do I go about exactly implementing this in Keras? Say, I have binary classification task. Let's say my model is a simple one given below -
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', class_weights={0:0.5, 1:4}, optimizer=Adam(), metrics=['accuracy'])
model.fit(x_train, y_train)
I think y_train needs to be reshaped/tiled from (batch_size, 1) to (batch_size, time_step) right?
The dense layer needs TimeDistributed to be applied correctly to the LSTM after setting return_sequences=True?
How do I exactly implement the exact loss function given above? Will class_weights need to be modified?
Target replication is only during training. How to implement validation set evaluation using only the main loss?
How should I deal with zero paddings in target replication? My sequences are padded to a max_len of 15 with average length being 7. Since the target replication loss averages over all the steps, how do I make sure it doesn't use the padded words in calculating the loss? Basically, dynamically assign T the actual sequence length.
Question 1:
So, for the targets, you need it shaped as (batch_size, time_steps, 1). Just use:
y_train = np.stack([y_train]*time_steps, axis=1)
Question 2:
You're correct, but TimeDistributed is optional in Keras 2.
Question 3:
I don't know how class weights will behave, but a regular loss function should go like:
from keras.losses import binary_crossentropy
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
model.compile(......, loss = target_replication_loss(alpha), ...)
Question 3a:
Since the above doens't work well with class weights, I created an alternative where the weights go into the loss:
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
Question 4:
To avoid complexity, I'd say you should use an additional metric in validation:
def last_step_BC(true,pred):
return binary_crossentropy(true[:,-1], pred[:,-1])
model.compile(....,
loss = target_replication_loss(alpha),
metrics=[last_step_BC])
Question 5:
This is a hard one and I'd need to research a little....
As an initial workaround, you can set the model with an input shape of (None, features), and train each sequence individually.
Working example without class_weight
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
#print(K.int_shape(losses))
#print(K.int_shape(losses[:,:-1]))
#print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
#print(K.int_shape(losses[:,-1]))
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
alpha = 0.6
i1 = Input((5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
Working example with class weights:
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
print(K.int_shape(losses))
print(K.int_shape(losses[:,:-1]))
print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
print(K.int_shape(losses[:,-1]))
print(K.int_shape(weights))
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
alpha = 0.6
class_weights={0: 0.5, 1:4.}
i1 = Input(batch_shape=(3,5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha, class_weights))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model.
My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong?
I have already tried updating to latest keras and theano versions.
The code for my model is:
def create_lstm_nn(input_dim):
seq = Sequential()`
# embedd using pretrained 300d embedding
seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights]))
# encode via LSTM
seq.add(LSTM(128))
seq.add(Dropout(0.3))
return seq
lstm_nn = create_lstm_nn(input_dim)
input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
processed_a = lstm_nn(input_a)
processed_b = lstm_nn(input_b)
cos_distance = merge([processed_a, processed_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
distance = Lambda(lambda x: 1-x)(cos_distance)
model = Model(input=[input_a, input_b], output=distance)
# train
rms = RMSprop()
model.compile(loss='mse', optimizer=rms)
model.fit([X1, X2], y, validation_split=0.3, batch_size=128, nb_epoch=20)
I also tried using a simple Lambda instead of the Merge layer, but it has the same result.
def cosine_distance(vests):
x, y = vests
x = K.l2_normalize(x, axis=-1)
y = K.l2_normalize(y, axis=-1)
return -K.mean(x * y, axis=-1, keepdims=True)
def cos_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0],1)
distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
The nan is a common issue in deep learning regression. Because you are using Siamese network, you can try followings:
check your data: do they need to be normalized?
try to add an Dense layer into your network as the last layer, but be careful picking up an activation function, e.g. relu
try to use another loss function, e.g. contrastive_loss
smaller your learning rate, e.g. 0.0001
cos mode does not carefully deal with division by zero, might be the cause of NaN
It is not easy to make deep learning work perfectly.
I didn't run into the nan issue, but my loss wouldn't change. I found this info
check this out
def cosine_distance(shapes):
y_true, y_pred = shapes
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.sign(x) * K.maximum(K.abs(x), K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return K.mean(1 - K.sum((y_true * y_pred), axis=-1))