I'm trying to implement a loss function by using the representations of the intermediate layers. As far as I know, the Keras backend custom loss function only accepts two input arguments(y_ture, and y-pred). How can I define a loss function with #tf.function and use it for a model that has been defined via Keras?
any help would be appreciated.
this a simple workaround to pass additional variables to your loss function. in our case, we pass the hidden output of one of our layers (x1). this output can be used to do something inside the loss function (I do a dummy operation)
def mse(y_true, y_pred, hidden):
error = y_true-y_pred
return K.mean(K.square(error)) + K.mean(hidden)
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, 1000)
inp = Input((10,))
true = Input((1,))
x1 = Dense(32, activation='relu')(inp)
x2 = Dense(16, activation='relu')(x1)
out = Dense(1)(x2)
m = Model([inp,true], out)
m.add_loss( mse( true, out, x1 ) )
m.compile(loss=None, optimizer='adam')
m.fit(x=[X, y], y=None, epochs=3)
## final fitted model to compute predictions
final_m = Model(inp, out)
Related
i am using tensorflow/keras and i would like to use the input in the loss function
as per this answer here
Custom loss function in Keras based on the input data
I have created my loss function thusly
def custom_Loss_with_input(inp_1):
def loss(y_true, y_pred):
b = K.mean(inp_1)
return y_true - b
return loss
and set up the model with the layers and all ending like this
model = Model(inp_1, x)
model.compile(loss=custom_Loss_with_input(inp_1), optimizer= Ada)
return model
Nevertheless, i get the following error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
Any advice on how to eliminate this error?
Thanks in advance
You can use add_loss to pass external layers to your loss, in your case the input tensor.
Here an example:
def CustomLoss(y_true, y_pred, input_tensor):
b = K.mean(input_tensor)
return K.mean(K.square(y_true - y_pred)) + b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss( CustomLoss( target, out, inp ) )
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
If your loss is composed of different parts and you want to track them you can add different losses corresponding to the loss parts. In this way, the losses are printed at the end of each epoch and are stored in model.history.history. Remember that the final loss minimized during training is the sum of the various loss parts.
def ALoss(y_true, y_pred):
return K.mean(K.square(y_true - y_pred))
def BLoss(input_tensor):
b = K.mean(input_tensor)
return b
X = np.random.uniform(0,1, (1000,10))
y = np.random.uniform(0,1, (1000,1))
inp = Input(shape=(10,))
hidden = Dense(32, activation='relu')(inp)
out = Dense(1)(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss(ALoss( target, out ))
model.add_metric(ALoss( target, out ), name='a_loss')
model.add_loss(BLoss( inp ))
model.add_metric(BLoss( inp ), name='b_loss')
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
To use the model in inference mode (removing the target from inputs):
final_model = Model(model.input[0], model.output)
final_model.predict(X)
I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.
I'm attempting to wrap my Keras neural network in a class object. I have implemented the below outside of a class setting, but I want to make this more object-friendly.
To summarize, my model calls function sequential_model which creates a sequential model. Within the compile step, I have defined my own loss function weighted_categorical_crossentropy which I want the sequential model to implement.
However, when I run the code below I get the following error: ValueError: No gradients provided for any variable:
I suspect the issue is with how I'm defining the weighted_categorical_crossentropy function for its use by sequential.
Again, I was able to get this work in a non-object oriented way. Any help will be much appreciated.
from tensorflow.keras import Sequential, backend as K
class MyNetwork():
def __init__(self, file, n_output=4, n_hidden=20, epochs=3,
dropout=0.10, batch_size=64, metrics = ['categorical_accuracy'],
optimizer = 'rmsprop', activation = 'softmax'):
[...] //Other Class attributes
def model(self):
self.model = self.sequential_model(False)
self.model.summary()
def sequential_model(self, val):
K.clear_session()
if val == False:
self.epochs = 3
regressor = Sequential()
#regressor.run_eagerly = True
regressor.add(LSTM(units = self.n_hidden, dropout=self.dropout, return_sequences = True, input_shape = (self.X.shape[1], self.X.shape[2])))
regressor.add(LSTM(units = self.n_hidden, dropout=self.dropout, return_sequences = True))
regressor.add(Dense(units = self.n_output, activation=self.activation))
self.weights = np.array([0.025,0.225,0.78,0.020])
regressor.compile(optimizer = self.optimizer, loss = self.weighted_categorical_crossentropy(self.weights), metrics = [self.metrics])
regressor.fit(self.X, self.Y*1.0,batch_size=self.batch_size, epochs=self.epochs, verbose=1, validation_data=(self.Xval, self.Yval*1.0))
return regressor
def weighted_categorical_crossentropy(self, weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
There are several problems with above code, but the most noticeable one is you don't return the loss from weighted_categorical_crossentropy. It should look more like:
def weighted_categorical_crossentropy(self, weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
return loss # Return the callable function!
The error is ValueError: No gradients provided for any variable because the loss method doesn't return anything, it returns None! If you try to fit a method with loss=None, the model will have no way of computing gradients and therefore it will throw the same exact error.
Next up is the that you are using return_sequences = True in the layer right before a non-recurrent layer. This causes the Dense layer to be called on mis-shaped data, that's appropriate only for recurrent layers. Don't use it like that.
If you have a good reason for using the return_sequences = True, then you must add Dense layer like:
model.add(keras.layers.TimeDistributed(keras.layers.Dense(...)))
This will cause the Dense layer to act on output sequence on every time step separately. This also means that your y_true must be of appropriate shape.
There could be other problems with the custom loss function that you defined, but I can not deduce the input/output shapes, so you will have to run it and add see if it works. There will probably be matrix multiplication shape mismatch.
Last but not least, think about using the Sub-classing API. Could it make any of your operations easier to write?
Thanks for reading and I'll update this answer once I have that info. Cheers.
I have modeled multivariate model with more than 100 different output layers in parallel.
I am able to get averaged loss function, but it is not really possible for me to get averaged accuracy values. (I am doing regression)
Would you be kind enough as to suggest any idea of how to do this in KERAS?
Thanks
I create a custom callback to do this
class MergeMetrics(Callback):
def __init__(self,**kargs):
super(MergeMetrics,self).__init__(**kargs)
def on_epoch_begin(self,epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
logs['merge_mse'] = np.mean([logs[m] for m in logs.keys() if 'mse' in m])
logs['merge_mae'] = np.mean([logs[m] for m in logs.keys() if 'mae' in m])
I use this callback to merge 2 metrics coming from 2 different outputs. I use a simple problem for example but you can integrate it easily in your problem and integrate it with a validation set
this is the dummy example where I use mse and mae as metrics
X = np.random.uniform(0,1, (1000,10))
y1 = np.random.uniform(0,1, 1000)
y2 = np.random.uniform(0,1, 1000)
inp = Input((10,))
x = Dense(32, activation='relu')(inp)
out1 = Dense(1, name='y1')(x)
out2 = Dense(1, name='y2')(x)
m = Model(inp, [out1,out2])
m.compile('adam','mae', metrics=['mse','mae'])
checkpoint = MergeMetrics()
m.fit(X, [y1,y2], epochs=10, callbacks=[checkpoint])
the printed output is:
loss: ... - y1_mse: 0.2227 - y1_mae: 0.3884 - y2_mse: 0.1163 - y2_mae: 0.2805 - merge_mse: 0.1695 - merge_mae: 0.3345
I am trying to stack two networks together. I want to calculate loss of each network separately. For example in the image below; loss of LSTM1 should be (Loss1 + Loss2) and loss of system should be just (Loss2)
I implemented a network like below with the idea above but have no idea how to compile and run it.
def build_lstm1():
x = Input(shape=(self.timesteps, self.input_dim,), name = 'input')
h = LSTM(1024, return_sequences=True))(x)
scores = TimeDistributed(Dense(self.input_dim, activation='sigmoid', name='dense'))(h)
LSTM1 = Model(x, scores)
return LSTM1
def build_lstm2():
x = Input(shape=(self.timesteps, self.input_dim,), name = 'input')
h = LSTM(1024, return_sequences=True))(x)
labels = TimeDistributed(Dense(self.input_dim, activation='sigmoid', name='dense'))(h)
LSTM2 = Model(x, labels)
return LSTM2
lstm1 = build_lstm1()
lstm2 = build_lstm2()
combined = Model(inputs = lstm1.input ,
outputs = [lstm1.output,
lstm2(lstm1.output).output)])
This is a wrong way of using the Model functional API of Keras. Also it's not possible to have loss of LSTM1 as Loss1+ Loss2. It will be only Loss1. Similarly for LSTM2 it will be only Loss2. However, for the combined network you can have any linear combination of Loss1 and Loss2 as the overall Loss i.e.
Loss_overall = a.Loss1 + b.Loss2. where a,b are non-negative real numbers
The real essence of Model Functional API is that it allows you create Deep learning architecture with multiple outputs and multiple inputs in single model.
def build_lstm_combined():
x = Input(shape=(self.timesteps, self.input_dim,), name = 'input')
h_1 = LSTM(1024, return_sequences=True))(x)
scores = TimeDistributed(Dense(self.input_dim, activation='sigmoid', name='dense'))(h_1)
h_2 = LSTM(1024, return_sequences=True))(h_1)
labels = TimeDistributed(Dense(self.input_dim, activation='sigmoid', name='dense'))(h_2)
LSTM_combined = Model(x,[scores,labels])
return LSTM_combined
This combined model has loss which is combination of both Loss1 and Loss2. While compiling the model you can specify the weights of each loss to get overall loss. If your desired loss is 0.5Loss1 + Loss2 you can do this by:
model_1 = build_lstm_combined()
model_1.compile(optimizer=Adam(0.001), loss = ['categorical_crossentropy','categorical_crossentropy'],loss_weights= [0.5,1])