How to train deep neural network with custom loss - python

I am interested in how to train deep neural network with custom loss-function. I have seen posts on stack overflow but they aren't answered. I have downloaded VGG16 and froze weights and added my own head. Now I want to train that network with custom loss, how can I do that?

Here is a custom RMSE loss in PyTorch. I hope this gives you a concrete idea of how to implement a custom loss function. You must create a class that inherits nn.Module, define the initialization and forward pass.
class RMSELoss(nn.Module):
def __init__(self, eps=1e-9):
super().__init__()
self.mse = nn.MSELoss()
self.eps = eps
def forward(self,yhat,y):
loss = torch.sqrt(self.mse(yhat,y) + self.eps)
return loss

You can simply define a function with two input parameters(true value, predicted value). Then you can calculate the loss using those values by your very own method.
Here is the coding sample:
def custom_loss( y_true , y_pred ):
tf.losses.mean_squared_error( y_true , y_pred )
I have used mse from tf backend in this example. But you can use manual calculation here.
Compile your model with this loss function.
model.compile(
optimizer=your_optimizer,
loss=custom_loss
)
You can also define your own customized metric to judge during the training.
def custom_metric( y_true , y_pred ):
return calculate_your_metric( y_true , y_pred )
Finally, compile with it,
model.compile(
optimizer=your_optimizer,
loss=custom_loss,
metrics=[ custom_metric ]
)

There are several examples and repositories showing how to implement perceptual loss which sounds like what you are referring to. Of course, you can generalize and learn from some of these approaches to different models depending on your problem. If you do so, I recommend writing about it and sharing. I don't see many examples other than using some pretrained vgg model, and breaking that mold might be a nice contribution! Anyway, you might find these other answers useful:
Implement perceptual loss with pretrained VGG using keras
VGG, perceptual loss in keras

Related

Keras loss and metrics values do not match with same function in each

I am using keras with a custom loss function like below:
def custom_fn(y_true, y_pred):
# changing y_true, y_pred values systematically
return mean_absolute_percentage_error(y_true, y_pred)
Then I am calling model.compile(loss=custom_fn) and model.fit(X, y,..validation_data=(X_val, y_val)..)
Keras is then saving loss and val_loss in model history. As a sanity check, when the model finishes training, I am using model.predict(X_val) so I can calculate validation loss manually with my custom_fn using the trained model.
I am saving the model with the best epoch using this callback:
callbacks.append(ModelCheckpoint(path, save_best_only=True, monitor='val_loss', mode='min'))
so after calculating this, the validation loss should match keras' val_loss value of the best epoch. But this is not happening.
As another attempt to figure this issue out, I am also doing this:
model.compile(loss=custom_fn, metrics=[custom_fn])
And to my surprise, val_loss and val_custom_fn do not match (neither loss or loss_custom_fn for that matter).
This is really strange, my custom_fn is essentially keras' built in mape with the y_true and y_pred slightly manipulated. what is going on here?
PS: the layers I am using are LSTM layers and a final Dense layer. But I think this information is not relevant to the problem. I am also using regularisation as hyperparameter but not dropout.
Update
Even removing custom_fn and using keras' built in mape as a loss function and metric like so:
model.compile(loss='mape', metrics=['mape'])
and for simplicity, removing ModelCheckpoint callback is having the same effect; val_loss and val_mape for each epoch are not equivalent. This is extremely strange to me. I am either missing something or there is a bug in Keras code..the former might be more realistic.
This blog post suggests that keras adds any regularisation used in the training when calculating the validation loss. And obviously, when calculating the metric of choice no regularisation is applied. This is why it occurs with any loss function of choice as stated in the question.
This is something I could not find any documentation on from Keras. However, it seems to hold up since when I remove all regularisation hyperparameters, the val_loss and val_custom_fn match exactly in each epoch.
An easy workaround is to either use the custom_fn as a metric and save the best model based on the metric (val_custom_fn) than on the val_loss. Or else Loop through each epoch manually and calculate the correct val_loss manually after training each epoch. The latter seems to make more sense since there is no reason to include custom_fn both as a metric and as a loss function.
If anyone can find any evidence of this in the Keras documentation that would be helpful.

when call Keras Model, then there is no difference between having #tf.function or not. But different when build low-level model

Is this issue a bug?
Compared the following two codes. If include #tf.function then both works well. If not include #tf.fucntion then the custom low-level model does not train.
model = tf.keras.Model(inputs=inputs, outputs=outputs)
# #tf.function
def propagate(x_batch, y_batch):
"""
Complete both forward and backward propagation on our
batches.
"""
# Record operations to automatically obtain the gradients
with tf.GradientTape() as tape:
logits = model(x_batch)
# Calculates the total loss of the entire network
loss = loss_fn(y_batch, tf.nn.softmax(logits))
# Compute the accuracy of our model
# (Convert our logits to softmax distribution)
accuracy(y_batch, tf.nn.softmax(logits))
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
Compared to when we define custom low-level model:
class Model(object):
def __init__(self):
self.weights, self.biases = self.initialize_weights_and_biases()
self.trainable_vars = list(self.weights.values()) + list(self.biases.values())
def initialize_weights_and_biases(self):
return out_layer
This is a very good question and there has been a very interesting conversation in Github about it.
A Google Engineer (with Github ID alextp) has clarified this question in Github.
Providing the clarification here for the benefit of the Stackoverflow Community.
The problem is because of using softmax and then cross entropy, instead of using softmax_cross_entropy_with_logits or the equivalent.
softmax-then-cross-entropy is really numerically unstable (you throw away most of the bits of your logits when doing softmax) and should never be used.
Because of this the keras cross-entropy function has logic to "undo" the softmax in graph mode:
So, the solution is to use softmax_cross_entropy_with_logits or the equivalent instead of using softmax and then entropy separately.
For a detailed and insightful conversation about this issue, please refer this link.
Happy Learning!

Use neural network to learn distribution of values for classification

Use neural network to learn distribution of values for classification
The aim is to classify 1-D inputs using a neural network. There are two classes that should be classified, A and B. Each input, used to determine the class, is a number between 0.0 and 1.0.
The input values for class A are evenly distributed between 0 and 1 like so:
The input values for class B are all in the range of 0.4 to 0.6 like so:
Now I want to train a neural network that can learn to classify values in the range of 0.4 to 0.6 as B and the rest as A. So I need a neural network that can approximate the upper and lower bounds of a class. My previous attemps at doing so have been unsuccessful - the neural network always returns a 50% probability for any input across the board, and the loss does not decrease during epochs.
Using Tensorflow and Keras in Python I have trained simple models such as the following:
model = keras.Sequential([
keras.layers.Dense(1),
keras.layers.Dense(5, activation=tf.nn.relu),
keras.layers.Dense(5, activation=tf.nn.relu),
keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
(full training script linked below)
On a side note, I would imagine the neural network to work like this: Some neurons fire only below 0.4, some only above 0.6. If either of those groups of neuron fires, it's class A, if neither fires, it's class B. Unfortunately, that's not what is happening.
How does one go about classifying the inputs described above using neural networks?
--
Example script: https://pastebin.com/xNJUqXyU
Several things could be changed in your model architecture here.
First, the loss should not be loss='mean_squared_error', it is better to use loss='binary_crossentropy', which is better suited for binary classification problems. I will not explain the difference here, this is something that can be looked up easily in the Keras documentation.
You also need to change the definition of your last layer. You only need to have one last node, which will be the probability of belonging to class 1 (hence having a node for the probability of belonging to class 0 is redundant), and you should be using activation=tf.nn.sigmoid instead of softmax.
Something else you can do is define class weights to deal with the imbalance of your data. It seems like given how you define your sample here, weighting class 0 to be 4 times as much as class 1 would make sense.
Once all these changes are made, you should be left with something that looks like this:
model = keras.Sequential([
keras.layers.Dense(1),
keras.layers.Dense(5, activation=tf.nn.relu),
keras.layers.Dense(5, activation=tf.nn.relu),
keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(np.array(inputs_training), np.array(targets_training), epochs=5, verbose=1, class_weight = {0:4, 1:1})
This gives me 96% accuracy on the validation set, and each epoch does reduce the loss.
(On a side note, it seems to me like a Decision Tree would be much better suited here, as it would behave explicitely like you described to perform the classification)

How to get predictions on X_test given the DNN?

I finished building the DNN model for the Titanic Dataset. Given that, how do I make predictions on the X_test? My code can be accessed through my github:
https://github.com/isaac-altair/Titanic-Dataset
Thanks
When you trained your model you asked tensorflow to evaluate your train_op. Your train_op is your optimizer, e.g.:
train_op = tf.train.AdamOptimizer(...).minimize(cost)
You ran something like this to train the model:
sess.run([train_op], feed_dict={x:data, y:labels})
The train_op depends on things like the gradients and the operations that update the weights, so all of these things happened when you ran the train_op.
At inference time you simply ask it to perform different calculations. You can have the optimizer defined, but if you don't ask it to run the optimizer it won't perform any of the actions that the optimizer is dependent on. You probably have an output of the network called logits (you could call it anything, but logits is the most common and seen in most tutorials). You might also have defined an op called accuracy which computes the accuracy of the batch. You can get the value of those with a similar request to tensorflow:
sess.run([logits, accuracy], feed_dict={x:data, y:labels})
Almost any tutorial will demonstrate this. My favorite tutorials are here: https://github.com/aymericdamien/TensorFlow-Examples

Universal Tensorflow Wrapper for Model training

I want to build a tensorflow wrapper to train model. The idea is that you can define your model in a function, pass it to object/wrapper and it will do the rest. So you don't have to code everything from the beginning every time. I will make it clear with some pseudocode
def model():
//Define your tf graph/structure here
return output
And then you will have a class, which you can pass your model, training data, valid data into it
class tf_wrapper():
def __init___(model,training_data,valid_data):
//init stuffs
def train():
//code to train the model
The train code should look like some standard one in many tutorial:
for i in range(epochs):
sess.run(feed_dict{placeholder_X: batch_X, placeholder_Y: batchY)
What I struggle with right now is, there is different kind of model structure, loss function, input pipeline ... for example: loss function for classification task is different from regression (crossmax entropy vs MSE) also the calculation of accuracy, or the way you input data of CNN is different from RNN. What is the best way to solve this problem ?

Categories

Resources