I have the following model
def get_model():
epochs = 100
learning_rate = 0.1
decay_rate = learning_rate / epochs
inp = keras.Input(shape=(64, 101, 1), name="inputs")
x = layers.Conv2D(128, kernel_size=(3, 3), strides=(3, 3), padding="same")(inp)
x = layers.Conv2D(256, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(150)(x)
x = layers.Dense(150)(x)
out1 = layers.Dense(40000, name="sf_vec")(x)
out2 = layers.Dense(128, name="ls_weights")(x)
model = keras.Model(inp, [out1, out2], name="2_out_model")
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=decay_rate), # in caso rimettere 0.001
loss="mean_squared_error")
keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
model.summary()
return model
that is, I want to train my neural network based on the "mix" of the loss from the first output and the loss from the second output.
I train my neural network in this way:
model.fit(x_train, [sf_train, ls_filters_train], epochs=10)
and during the training ,for example, this is shown:
Epoch 10/10 -> loss: 0.0702 - sf_vec_loss: 0.0666 - ls_weights_loss: 0.0035
I'd like to know if it's a case that the "loss" is nearly the sum between the sf_vec_loss and ls_weights_loss or if keras is actually reasoning in this way.
Also, is the network being trained on the "loss" only?
Thank you in advance :)
following the Tensorflow Documentation...
from the loss argument:
If the model has multiple outputs, you can use a different loss on
each output by passing a dictionary or a list of losses. The loss
value that will be minimized by the model will then be the sum of all
individual losses
remember also that you can also weight the loss contributions of different model outputs
from the loss_weights argument:
The loss value that will be minimized by the model will then be the
weighted sum of all individual losses, weighted by the loss_weights coefficients
Related
My model has two inputs and I want to calculate the loss of the two inputs separately because the loss of input 2 has to be multiplied by a weight. Then add up these two losses as the final loss for the model. The structure is somehow like this:
This is my model:
def final_loss(y_true, y_pred):
loss = x_loss_value.output + y_model.output*weight
return loss
def mymodel(input_shape): #pooling=max or avg
img_input1 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
image_input2 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
#for input1
x = Conv2D(32, (3, 3), strides=(2, 2))(img_input1)
x_dense = Dense(2, activation='softmax', name='predictions')(x)
x_loss_value = my_categorical_crossentropy_layer(x)[input1_y_true, input1_y_pred]
x_model = Model(inputs=img_input1, outputs=x_loss_value)
#for input2
y = Conv2D(32, (3, 3), strides=(2, 2))(image_input2)
y_dense = Dense(2, activation='softmax', name='predictions')(y)
y_loss_value = my_categorical_crossentropy_layer(y)[input2_y_true, input2_y_pred]
y_model = Model(inputs=img_input2, outputs=y_loss_value)
concat = concatenate([x_model.output, y_model.output])
final_dense = Dense(2, activation='softmax')(concat)
# Create model.
model = Model(inputs=[img_input1,image_input2], output = final_dense)
return model
model.compile(optimizer = optimizers.adam(lr=1e-7), loss = final_loss, metrics = ['accuracy'])
Most of the related solutions I found just customize the final loss and change the loss in Model.complie(loss=customize_loss).
However, I need to apply different losses for different inputs. I'm trying to use a customized layer like this, and get my loss value for final the loss calculation:
class my_categorical_crossentropy_layer1(Layer):
def __init__(self, **kwargs):
self.is_placeholder = True
super(my_categorical_crossentropy_layer1, self).__init__(**kwargs)
def my_categorical_crossentropy_loss(self, y_true, y_pred):
y_pred = K.constant(y_pred) if not K.is_tensor(y_pred) else y_pred
y_true = K.cast(y_true, y_pred.dtype)
return K.categorical_crossentropy(y_true, y_pred, from_logits=from_logits)
def call(self, y_true, y_pred):
loss = self.my_categorical_crossentropy_loss(y_true, y_pred)
self.add_loss(loss, inputs=(y_true, y_pred))
return loss
But, inside the keras model, I can't figure out how to get the y_true and y_pred of the current epoch/batch for my loss layer.
So I can't add x = my_categorical_crossentropy_layer()[y_true, y_pred] to my model.
Is there any way to do the variable calculation like this in the keras model?
Further, can Keras get the previous epoch's training loss or val loss during training process?
I want to apply the previous epoch's training loss as my weight in the final loss.
this is my proposal...
your it's a double binary classification problem that you want to carry out using a single fit. the first thing to notice is that you need to take care of dimensionality: your input is 4d while your target is 2d one-hot encoded so your network needs something to reduce dimensionality, for example, flatten or global pooling. after this, you can start fitting creating a single model with two inputs and two outputs and use two losses. in your case, the losses are weighted categorical_crossentropy. keras enable by default to set the loss weights using loss_weights parameters. to reproduce the formula loss1*1+loss2*W set the weights to [1, W]. you can use the loss_weights parameter also specifying different losses for your output in this way losses=[loss1, loss2, ....] which are linearly combined with the weights specified in the loss_weights
below a working example
input_shape = (28,28,3)
n_sample = 10
# create dummy data
X1 = np.random.uniform(0,1, (n_sample,)+input_shape) # 4d
X2 = np.random.uniform(0,1, (n_sample,)+input_shape) # 4d
y1 = tf.keras.utils.to_categorical(np.random.randint(0,2, n_sample)) # 2d
y2 = tf.keras.utils.to_categorical(np.random.randint(0,2, n_sample)) # 2d
def mymodel(input_shape, weight):
img_input1 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
img_input2 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
# for input1
x = Conv2D(32, (3, 3), strides=(2, 2))(img_input1)
x = GlobalMaxPool2D()(x) # pass from 4d to 2d
x = Dense(2, activation='softmax', name='predictions1')(x)
# for input2
y = Conv2D(32, (3, 3), strides=(2, 2))(img_input2)
y = GlobalMaxPool2D()(y) # pass from 4d to 2d
y = Dense(2, activation='softmax', name='predictions2')(y)
# Create model
model = Model([img_input1,img_input2], [x,y])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'],
loss_weights=[1,weight])
return model
weight = 0.3
model = mymodel(input_shape, weight)
model.summary()
model.fit([X1,X2], [y1,y2], epochs=2)
I want to customize the fit function of the model in order to apply the gradient descent on the weights only if the model improved its predictions on the validation data. The reason for this is that I want to prevent overfitting.
According to this guide it should be possible to customize the fit function of the model. However, the following code runs into errors:
class CustomModel(tf.keras.Model):
def train_step(self, data):
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
### check and apply gradient
Y_pred_val = self.predict(X_val) # this does not work
acc_val = calculate_accuracy(Y_val, Y_pred_val)
if acc_val > last_acc_val:
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
###
self.compiled_metrics.update_state(y, y_pred)
return_obj = {m.name: m.result() for m in self.metrics}
return_obj["acc_val"] = acc_val
return return_obj
How could it be possible to evaluate the model inside the fit function?
You don't have to subclass fit() for this. You can just make a custom training loop. Look how I did that:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from tensorflow.keras import Model
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Concatenate
import tensorflow_datasets as tfds
from tensorflow.keras.regularizers import l1, l2, l1_l2
from collections import deque
dataset, info = tfds.load('mnist',
with_info=True,
split='train',
as_supervised=False)
TAKE = 1_000
data = dataset.map(lambda x: (tf.cast(x['image'],
tf.float32), x['label'])).shuffle(TAKE).take(TAKE)
len_train = int(8e-1*TAKE)
train = data.take(len_train).batch(8)
test = data.skip(len_train).take(info.splits['train'].num_examples - len_train).batch(8)
class CNN(Model):
def __init__(self):
super(CNN, self).__init__()
self.layer1 = Dense(32, activation=tf.nn.relu,
kernel_regularizer=l1(1e-2),
input_shape=info.features['image'].shape)
self.layer2 = Conv2D(filters=16,
kernel_size=(3, 3),
strides=(1, 1),
activation='relu',
input_shape=info.features['image'].shape)
self.layer3 = MaxPooling2D(pool_size=(2, 2))
self.layer4 = Conv2D(filters=32,
kernel_size=(3, 3),
strides=(1, 1),
activation=tf.nn.elu,
kernel_initializer=tf.keras.initializers.glorot_normal)
self.layer5 = MaxPooling2D(pool_size=(2, 2))
self.layer6 = Flatten()
self.layer7 = Dense(units=64,
activation=tf.nn.relu,
kernel_regularizer=l2(1e-2))
self.layer8 = Dense(units=64,
activation=tf.nn.relu,
kernel_regularizer=l1_l2(l1=1e-2, l2=1e-2))
self.layer9 = Concatenate()
self.layer10 = Dense(units=info.features['label'].num_classes)
def call(self, inputs, training=None, **kwargs):
b = self.layer1(inputs)
a = self.layer2(inputs)
a = self.layer3(a)
a = self.layer4(a)
a = self.layer5(a)
a = self.layer6(a)
a = self.layer8(a)
b = self.layer7(b)
b = self.layer6(b)
x = self.layer9([a, b])
x = self.layer10(x)
return x
cnn = CNN()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_loss = tf.keras.metrics.Mean()
test_loss = tf.keras.metrics.Mean()
train_acc = tf.keras.metrics.SparseCategoricalAccuracy()
test_acc = tf.keras.metrics.SparseCategoricalAccuracy()
optimizer = tf.keras.optimizers.Nadam()
template = 'Epoch {:3} Train Loss {:7.4f} Test Loss {:7.4f} ' \
'Train Acc {:6.2%} Test Acc {:6.2%} '
epochs = 5
early_stop = epochs//50
loss_hist = deque()
acc_hist = deque(maxlen=1)
acc_hist.append(0)
for epoch in range(1, epochs + 1):
train_loss.reset_states()
test_loss.reset_states()
train_acc.reset_states()
test_acc.reset_states()
for images, labels in train:
with tf.GradientTape() as tape:
logits = cnn(images, training=True)
loss = loss_object(labels, logits)
train_loss(loss)
train_acc(labels, logits)
current_acc = tf.metrics.SparseCategoricalAccuracy()(labels, logits)
if tf.greater(current_acc, acc_hist[-1]):
print('IMPROVEMENT.')
gradients = tape.gradient(loss, cnn.trainable_variables)
optimizer.apply_gradients(zip(gradients, cnn.trainable_variables))
acc_hist.append(current_acc)
for images, labels in test:
logits = cnn(images, training=False)
loss = loss_object(labels, logits)
test_loss(loss)
test_acc(labels, logits)
print(template.format(epoch,
train_loss.result(),
test_loss.result(),
train_acc.result(),
test_acc.result()))
if len(loss_hist) > early_stop and loss_hist.popleft() < min(loss_hist):
print('Early stopping. No validation loss decrease in %i epochs.' % early_stop)
break
Output:
IMPROVEMENT.
IMPROVEMENT.
IMPROVEMENT.
IMPROVEMENT.
Epoch 1 Train Loss 21.1698 Test Loss 21.3391 Train Acc 37.13% Test Acc 38.50%
IMPROVEMENT.
IMPROVEMENT.
IMPROVEMENT.
Epoch 2 Train Loss 13.8314 Test Loss 12.2496 Train Acc 50.88% Test Acc 52.50%
Epoch 3 Train Loss 13.7594 Test Loss 12.5884 Train Acc 51.75% Test Acc 53.00%
Epoch 4 Train Loss 13.1418 Test Loss 13.2374 Train Acc 52.75% Test Acc 51.50%
Epoch 5 Train Loss 13.6471 Test Loss 13.3157 Train Acc 49.63% Test Acc 51.50%
Here's the part that did the job. It's a deque and it skips the application of gradients if the last element of the deque is smaller.
for images, labels in train:
with tf.GradientTape() as tape:
logits = cnn(images, training=True)
loss = loss_object(labels, logits)
train_loss(loss)
train_acc(labels, logits)
current_acc = tf.metrics.SparseCategoricalAccuracy()(labels, logits)
if tf.greater(current_acc, acc_hist[-1]):
print('IMPROVEMENT.')
gradients = tape.gradient(loss, cnn.trainable_variables)
optimizer.apply_gradients(zip(gradients, cnn.trainable_variables))
acc_hist.append(current_acc)
Rather than create a custom fit I think it would be easier to use the callback ModelCheckpoint.
What you are trying to do is get the model that has the lowest validation error. Set it up to monitor validation loss. That way it will save the best model even if the network starts to over fit. Documentation is here.
If you do not get a model with a satisfactory validation accuracy then you will have to take other measures.
First look at your training accuracy.
My experience is that you should achieve at least 95%.
If the training accuracy is good but the validation accuracy is poor and degrades as you run more epochs that is a sign of over fitting.
You did not show the model but if you are doing classification you will probably have dense layers with the final layer using softmax activation.
Start out with model with only one dense layer and see if it trains well.
If not you may have to add additional dense hidden layers. If you do include a drop out layer to help prevent over fitting. You might also consider using regularizers. Documentation is
here..
I also find you can get improved performance if you dynamically adjust the learning rate. The callback ReduceLROnPlateau enables that capability.
Set it up to monitor validation loss and to reduce the learning rate by a factor if the loss fails to decrease. Documentation is here.
Actually I want to use different loss functions in training and validation phase. I tried in_tarin_phase but it doesn't work.
So I just wonder can I disable the val_loss calculation?
Below has a custom loss function:
# Build a model
inputs = Input(shape=(128,))
layer1 = Dense(64, activation='relu')(inputs)
layer2 = Dense(64, activation='relu')(layer1)
predictions = Dense(10, activation='softmax')(layer2)
model = Model(inputs=inputs, outputs=predictions)
# Define custom loss
def custom_loss(layer):
# Create a loss function that adds the MSE loss to the mean of all squared activations of a specific layer
def loss(y_true,y_pred):
return K.mean(K.square(y_pred - y_true) + K.square(layer), axis=-1)
# Return a function
return loss
# Compile the model
model.compile(optimizer='adam',
loss=custom_loss(layer), # Call the loss function with the selected layer
metrics=['accuracy'])
# train
model.fit(data, labels)
I have a model content one encoder and two decoder with two loss function:
input_shape = (384, 512, 3)
model = Model(inputs=input, outputs=[1_features, 2_features])
model = build_model(input_shape, 3)
losses = {
"loss1_output": "categorical_crossentropy",
"loss2_output": "categorical_crossentropy"}
lossWeights = {"loss1_output": 1.0, "loss2_output": 1.0}
EPOCHS = 50
INIT_LR = 1e-3
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(optimizer=opt, loss=losses, loss_weights=lossWeights,
metrics=["accuracy"])
I would combine the value for both those losses in one loss value and backward the result of the combination.
My question is close to this one which I read and tried and I found the model called the loss function one time for each branch (output).
Edits below
I am in the process of learning about artificial neural networks using the Keras library and in order to ensure that I have a good understanding of the basics of neural network classification, I have been trying to reproduce a neural network written with Keras using only tensorflow. However, I have run into some problems.
training_epochs = 100
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1
classifier = Sequential()
classifier.add(Dense(output_dim=n_hidden_1, init='uniform', activation='relu', input_dim=n_input))
classifier.add(Dense(output_dim=n_hidden_2, init='uniform', activation='relu'))
classifier.add(Dense(output_dim=n_output, init='uniform', activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, batch_size=10, nb_epoch=training_epochs)
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
cm = confusion_matrix(y_test, y_pred)
print(cm)
So essentially I am using a neural network with 2 hidden layers of size 6, an input layer of size 11, and an output of size 1. My output uses the sigmoid function to generate probabilities in order to classify training data into binary categories. I tried to reproduce this with tensorflow as follows:
training_epochs = 100
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1
def neuralNetwork(x, weights):
layer_1 = tf.matmul(x, weights['h1'])
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(layer_1, weights['h2'])
layer_2 = tf.nn.relu(layer_2)
output_layer = tf.matmul(layer_2, weights['output'])
return output_layer
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
}
x = tf.placeholder('float', [None, n_input]) # [?, 11]
y = tf.placeholder('float', [None, n_output]) # [?, 1]
logits = neuralNetwork(x, weights)
prediction = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=y))
optimizer = tf.train.AdamOptimizer().minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
for epoch in range(training_epochs):
loss, accuracy = session.run([optimizer, cost], feed_dict={x:X_train, y:y_train})
print('Epoch: {} Acc: {}'.format(epoch+1, accuracy))
print('Model has completed training.')
However, I keep getting the error:
Cannot feed value of shape (8000,) for Tensor 'Placeholder_1:0', which has shape '(?, 1)
My input data has 8000 rows with 11 columns and my output data has 8000 rows and 1 column. In order to try to reshape my data, I tried feeding it in row by row, but I kept getting more errors. Am I going about this the right way? Any help would be appreciated!
Edit: So I updated my code following the given suggestions. I am now getting output for accuracy, however, it seems to finish at around 4-5%. Furthermore, the accuracy also seems to decrease over time rather than improving. When I increase the number of training epochs to 200, the accuracy dips even lower (to around 2%).
Epoch: 1 Acc: 7.641509056091309
...
...
Epoch: 100 Acc: 4.339457035064697