Related
I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.
I use Python 3.7 and Keras 2.2.4. I created a Keras model with two output layers:
self.df_model = Model(inputs=input, outputs=[out1,out2])
As the loss history only returns one loss value per epoch, I want to get the loss of each output layer. How is it possible to get two loss values per epoch, one for each output layer?
Each model in Keras has a default History callback which stores all the loss and metric values of all the epochs, both the aggregate values as well as per output layer. This callback creates a History object which is returned when fit model is called and you can access all of these values by using the history property of that object (it is actually a dictionary):
history = model.fit(...)
print(history.history) # <-- a dict which contains all the loss and metric values per epoch
A minimal reproducible example:
from keras import layers
from keras import Model
import numpy as np
inp = layers.Input((1,))
out1 = layers.Dense(2, name="output1")(inp)
out2 = layers.Dense(3, name="output2")(inp)
model = Model(inp, [out1, out2])
model.compile(loss='mse', optimizer='adam')
x = np.random.rand(2, 1)
y1 = np.random.rand(2, 2)
y2 = np.random.rand(2, 3)
history = model.fit(x, [y1,y2], epochs=5)
print(history.history)
#{'loss': [1.0881365537643433, 1.084699034690857, 1.081269383430481, 1.0781562328338623, 1.0747418403625488],
# 'output1_loss': [0.87154925, 0.8690172, 0.86648905, 0.8641926, 0.8616721],
# 'output2_loss': [0.21658726, 0.21568182, 0.2147803, 0.21396361, 0.2130697]}
I want to replace the loss function related to my neural network during training, this is the network:
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dropout(0.25))
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
model.add(tensorflow.keras.layers.Dropout(0.5))
model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))
so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:
model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training
but I have this error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Why? How can I fix it? There is another way to change loss function?
Thanks
I'm currently working on google colab with Tensorflow and Keras and i was not able to recompile a model mantaining the weights, every time i recompile a model like this:
with strategy.scope():
model = hd_unet_model(INPUT_SIZE)
model.compile(optimizer=Adam(lr=0.01),
loss=tf.keras.losses.MeanSquaredError() ,
metrics=[tf.keras.metrics.MeanSquaredError()])
the weights gets resetted.
so i found an other solution, all you need to do is:
Get the model with the weights you want ( load it or something else )
gets the weights of the model like this:
weights = model.get_weights()
recompile the model ( to change the loss function )
set again the weights of the recompiled model like this:
model.set_weights(weights)
launch the training
i tested this method and it seems to work.
so to change the loss mid-Training you can:
Compile with the first loss.
Train of the first loss.
Save the weights.
Recompile with the second loss.
Load the weights.
Train on the second loss.
So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.
Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.
Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.
Solution:
My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).
The dataset can be found here Boston housing dataset
# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
batch_size = 25
# have to define manually a dict to store all epochs scores
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []
# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# Some plots to check what is going on
# loss function
pyplot.subplot(311)
pyplot.title('Loss')
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
pyplot.legend()
# Only mean squared error
pyplot.subplot(312)
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
pyplot.legend()
# Only mean squared logarithmic error
pyplot.subplot(313)
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
pyplot.legend()
plt.tight_layout()
pyplot.show()
The resulting plot confirm that the loss function is changing after the second epoch:
The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:
print(history['history']['loss'])
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
print(history['history']['mean_squared_error'])
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
print(history['history']['mean_squared_logarithmic_error'])
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.
I am doing a slight modification of a standard neural network by defining a custom loss function. The custom loss function depends not only on y_true and y_pred, but also on the training data. I implemented it using the wrapping solution described here.
Specifically, I wanted to define a custom loss function that is the standard mse plus the mse between the input and the square of y_pred:
def custom_loss(x_true)
def loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true))
return loss
Then I compile the model using
model_custom.compile(loss = custom_loss( x_true=training_data ), optimizer='adam')
fit the model using
model_custom.fit(training_data, training_label, epochs=100, batch_size = training_data.shape[0])
All of the above works fine, because the batch size is actually the number of all the training samples.
But if I set a different batch_size (e.g., 10) when I have 1000 training samples, there will be an error
Incompatible shapes: [1000] vs. [10].
It seems that Keras is able to automatically adjust the size of the inputs to its own loss function base on the batch size, but cannot do so for the custom loss function.
Do you know how to solve this issue?
Thank you!
==========================================================================
* Update: the batch size issue is solved, but another issue occurred
Thank you, Ori, for the suggestion of concatenating the input and output layers! It "worked", in the sense that the codes can run under any batch size. However, it seems that the result from training the new model is wrong... Below is a simplified version of the codes to demonstrate the problem:
import numpy as np
import scipy.io
import keras
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from numpy.random import seed
from tensorflow import set_random_seed
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = K.mean( K.square( y_pred[:,2] - y_true ) )
return mse
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mean_squared_error', optimizer='adam') # compile the model
model_standard.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=[input_custom], outputs=[output_custom]) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
model_custom.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_custom = model_custom.predict(testing_data) # make prediction
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )
Basically, I have a standard one-hidden-layer neural network, and a custom one-hidden-layer neural network whose output layer is concatenated with the input layer. For testing purpose, I did not use the concatenated input layer in the custom loss function, because I wanted to see if the custom network can reproduce the standard neural network. Since the custom loss function is equivalent to the standard 'mean_squared_error' loss, both networks should have the same training results (I also reset the random seeds to make sure that they have the same initialization).
However, the training results are very different. It seems that the concatenation makes the training process different? Any ideas?
Thank you again for all your help!
Final update: Ori's approach of concatenating input and output layers works, and is verified by using the generator. Thanks!!
The problem is that when compiling the model, you set x_true to be a static tensor, in the size of all the samples. While the input for keras loss functions are the y_true and y_pred, where each of them is of size [batch_size, :].
As I see it there are 2 options you can solve this, the first one is using a generator for creating the batches, in such a way that you will have control over which indices are evaluated each time, and at the loss function you could slice the x_true tensor to fit the samples being evaluated:
def custom_loss(x_true)
def loss(y_true, y_pred):
x_true_samples = relevant_samples(x_true)
return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true_samples))
return loss
This solution can be complicated, what I would suggest is a simpler workaround -
Concatenate the input layer with the output layer, such that your new output will be of the form original_output , input.
Now you can use a new modified loss function:
def loss(y_true, y_pred):
return K.mean(K.square(y_pred[:,:output_shape] - y_true[:,:output_shape]) +
K.square(y_true[:,:output_shape] - y_pred[:,outputshape:))
Now your new loss function will take in account both the input data, and the prediction.
Edit:
Note that while you set the seed, your models are not exactly the same, and as you did not use a generator, you let keras choose the batches, and for different models he might pick different samples.
As your model does not converge, different samples can lead to different results.
I added a generator to your code, to verify the samples we pick for training, now you can see both results are the same:
def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
mse = keras.losses.mean_squared_error(y_true, y_pred[:,2])
return mse
def generator(x, y, batch_size):
curIndex = 0
batch_x = np.zeros((batch_size,2))
batch_y = np.zeros((batch_size,1))
while True:
for i in range(batch_size):
batch_x[i] = x[curIndex,:]
batch_y[i] = y[curIndex,:]
i += 1;
if i == 5000:
i = 0
yield batch_x, batch_y
# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0
# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )
training_data = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data = x[5000:6000:1,:]
testing_label = y[5000:6000:1]
batch_size = 32
# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_standard = Input(shape=(2,)) # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard) # output layer
model_standard = Model(inputs=[input_standard], outputs=[output_standard]) # build the model
model_standard.compile(loss='mse', optimizer='adam') # compile the model
#model_standard.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_standard.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_standard = model_standard.predict(testing_data) # make prediction
# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000
# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)
input_custom = Input(shape=(2,)) # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom) # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])
model_custom = Model(inputs=input_custom, outputs=output_custom) # build the model
model_custom.compile(loss = custom_loss, optimizer='adam') # compile the model
#model_custom.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_custom.fit_generator(generator(training_data,training_label,batch_size), steps_per_epoch= 32, epochs= 100)
testing_label_pred_custom = model_custom.predict(testing_data)
# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000
# compare the result
print( [ mse_standard , mse_custom ] )
Using examples from Lipton et al (2016), target replication is basically calculating the loss at each time step (except final) of the LSTM (or GRU) and averaging this loss and adding it to the main loss while training. Mathematically, it is given by -
Graphically, it can be represented as -
So how do I go about exactly implementing this in Keras? Say, I have binary classification task. Let's say my model is a simple one given below -
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', class_weights={0:0.5, 1:4}, optimizer=Adam(), metrics=['accuracy'])
model.fit(x_train, y_train)
I think y_train needs to be reshaped/tiled from (batch_size, 1) to (batch_size, time_step) right?
The dense layer needs TimeDistributed to be applied correctly to the LSTM after setting return_sequences=True?
How do I exactly implement the exact loss function given above? Will class_weights need to be modified?
Target replication is only during training. How to implement validation set evaluation using only the main loss?
How should I deal with zero paddings in target replication? My sequences are padded to a max_len of 15 with average length being 7. Since the target replication loss averages over all the steps, how do I make sure it doesn't use the padded words in calculating the loss? Basically, dynamically assign T the actual sequence length.
Question 1:
So, for the targets, you need it shaped as (batch_size, time_steps, 1). Just use:
y_train = np.stack([y_train]*time_steps, axis=1)
Question 2:
You're correct, but TimeDistributed is optional in Keras 2.
Question 3:
I don't know how class weights will behave, but a regular loss function should go like:
from keras.losses import binary_crossentropy
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
model.compile(......, loss = target_replication_loss(alpha), ...)
Question 3a:
Since the above doens't work well with class weights, I created an alternative where the weights go into the loss:
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
Question 4:
To avoid complexity, I'd say you should use an additional metric in validation:
def last_step_BC(true,pred):
return binary_crossentropy(true[:,-1], pred[:,-1])
model.compile(....,
loss = target_replication_loss(alpha),
metrics=[last_step_BC])
Question 5:
This is a hard one and I'd need to research a little....
As an initial workaround, you can set the model with an input shape of (None, features), and train each sequence individually.
Working example without class_weight
def target_replication_loss(alpha):
def inner_loss(true,pred):
losses = binary_crossentropy(true,pred)
#print(K.int_shape(losses))
#print(K.int_shape(losses[:,:-1]))
#print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
#print(K.int_shape(losses[:,-1]))
return (alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1])
return inner_loss
alpha = 0.6
i1 = Input((5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)
Working example with class weights:
def target_replication_loss(alpha, class_weights):
def get_weights(x):
b = class_weights[0]
a = class_weights[1] - b
return (a*x) + b
def inner_loss(true,pred):
#this will only work for classification with only one class 0 or 1
#and only if the target is the same for all classes
true_classes = true[:,-1,0]
weights = get_weights(true_classes)
losses = binary_crossentropy(true,pred)
print(K.int_shape(losses))
print(K.int_shape(losses[:,:-1]))
print(K.int_shape(K.mean(losses[:,:-1], axis=-1)))
print(K.int_shape(losses[:,-1]))
print(K.int_shape(weights))
return weights*((alpha*K.mean(losses[:,:-1], axis=-1)) + ((1-alpha)*losses[:,-1]))
return inner_loss
alpha = 0.6
class_weights={0: 0.5, 1:4.}
i1 = Input(batch_shape=(3,5,2))
i2 = Input((5,2))
out = LSTM(1, activation='sigmoid', return_sequences=True)(i1)
model = Model(i1, out)
model.compile(optimizer='adam', loss = target_replication_loss(alpha, class_weights))
model.fit(np.arange(30).reshape((3,5,2)), np.arange(15).reshape((3,5,1)), epochs = 200)