I'm trying to train a simple keras model on some data using
approach 1
model.train_on_batch(x, y)
and approach 2
with tf.GradientTape() as g:
g.watch(model.variables)
loss = my_loss(
y_true=y,
y_pred=model(x)
)
gradients = g.gradient(loss, model.variables)
opt.apply_gradients(
zip(gradients, model.variables)
)
Even if the optimizer (Adam with some fixed learning rate) and the loss are the same, I do not get the exact same behavior. Is is expected ? (aka train_on_batch doing some additional stuff)
In approach 1, train_on_batch trains using a single batch only and once. Runs a single gradient update on a single batch of data. The idea of using train_on_batch is probably to do more things between each batch.
In approach 2, gradient update depends on the frequency of optimizer.apply_gradients(zip(grads, model.trainable_weights)) in the training loop.
As gradient update is happening differently in both the cases, there might be difference in the model behavior.
Hope this answers your question. Happy Learning.
Related
I have a neural network Network that has a vector output. Instead of using a typical loss function, I would like to implement my own loss function that is a method in some class. This looks something like:
class whatever:
def __init__(self, network, optimizer):
self.network = network
self.optimizer = optimizer
def cost_function(relevant_data):
...implementation of cost function with respect to output of network and relevant_data...
def train(self, epochs, other_params):
...part I'm having trouble with...
The main thing I'm concerned with is about taking gradients. Since I'm taking my own custom loss function, do I need to implement my own gradient with respect to the cost function?
Once I do the math, I realize that if the cost is J, then the gradient of J is a fairly simple function in terms of the gradient of the final layer of the Network. I.e, it looks something like: Equation link.
If I used some traditional loss function like CrossEntropy, my backprocess would look like:
objective = nn.CrossEntropyLoss()
for epochs:
optimizer.zero_grad()
output = Network(input)
loss = objective(output, data)
loss.backward()
optimizer.step()
But how do we do this in my case? My guess is something like:
for epochs:
optimizer.zero_grad()
output = Network(input)
loss = cost_function(output, data)
#And here is where the problem comes in
loss.backward()
optimizer.step()
loss.backward() as I understand it, takes the gradients of the loss function with respect to the parameters. But can I still invoke it while using my own loss function (presumably the program doesn't know what the gradient equation is). Do I have to implement another method/subroutine to find the gradients as well?
Which brings me to my other question: if I do want to implement gradient calculation for my loss function, I also need the gradient of the neural network parameters. How do I obtain those? Is there a function for that?
As long as all your steps starting from the input till the loss function involve differentiable operations on PyTorch's tensors, you need not do anything extra. PyTorch builds a computational graph that keeps track of each operation, its inputs, and gradients. So, calling loss.backward() on your custom loss would still propagate gradients back correctly through the graph. A Gentle Introduction to torch.autograd from the PyTorch tutorials may be a useful reference.
After the backward pass, if you need to directly access the gradients for further processing, you can do so using the .grad attribute (so t.grad for tensor t in the graph).
Finally, if you have a specific use case for finding the gradient of an arbitrary differentiable function implemented using PyTorch's tensors with respect to one of its inputs (e.g. gradient of the loss with respect to a particular weight in the network), you could use torch.autograd.grad.
I am using keras with a custom loss function like below:
def custom_fn(y_true, y_pred):
# changing y_true, y_pred values systematically
return mean_absolute_percentage_error(y_true, y_pred)
Then I am calling model.compile(loss=custom_fn) and model.fit(X, y,..validation_data=(X_val, y_val)..)
Keras is then saving loss and val_loss in model history. As a sanity check, when the model finishes training, I am using model.predict(X_val) so I can calculate validation loss manually with my custom_fn using the trained model.
I am saving the model with the best epoch using this callback:
callbacks.append(ModelCheckpoint(path, save_best_only=True, monitor='val_loss', mode='min'))
so after calculating this, the validation loss should match keras' val_loss value of the best epoch. But this is not happening.
As another attempt to figure this issue out, I am also doing this:
model.compile(loss=custom_fn, metrics=[custom_fn])
And to my surprise, val_loss and val_custom_fn do not match (neither loss or loss_custom_fn for that matter).
This is really strange, my custom_fn is essentially keras' built in mape with the y_true and y_pred slightly manipulated. what is going on here?
PS: the layers I am using are LSTM layers and a final Dense layer. But I think this information is not relevant to the problem. I am also using regularisation as hyperparameter but not dropout.
Update
Even removing custom_fn and using keras' built in mape as a loss function and metric like so:
model.compile(loss='mape', metrics=['mape'])
and for simplicity, removing ModelCheckpoint callback is having the same effect; val_loss and val_mape for each epoch are not equivalent. This is extremely strange to me. I am either missing something or there is a bug in Keras code..the former might be more realistic.
This blog post suggests that keras adds any regularisation used in the training when calculating the validation loss. And obviously, when calculating the metric of choice no regularisation is applied. This is why it occurs with any loss function of choice as stated in the question.
This is something I could not find any documentation on from Keras. However, it seems to hold up since when I remove all regularisation hyperparameters, the val_loss and val_custom_fn match exactly in each epoch.
An easy workaround is to either use the custom_fn as a metric and save the best model based on the metric (val_custom_fn) than on the val_loss. Or else Loop through each epoch manually and calculate the correct val_loss manually after training each epoch. The latter seems to make more sense since there is no reason to include custom_fn both as a metric and as a loss function.
If anyone can find any evidence of this in the Keras documentation that would be helpful.
I finished building the DNN model for the Titanic Dataset. Given that, how do I make predictions on the X_test? My code can be accessed through my github:
https://github.com/isaac-altair/Titanic-Dataset
Thanks
When you trained your model you asked tensorflow to evaluate your train_op. Your train_op is your optimizer, e.g.:
train_op = tf.train.AdamOptimizer(...).minimize(cost)
You ran something like this to train the model:
sess.run([train_op], feed_dict={x:data, y:labels})
The train_op depends on things like the gradients and the operations that update the weights, so all of these things happened when you ran the train_op.
At inference time you simply ask it to perform different calculations. You can have the optimizer defined, but if you don't ask it to run the optimizer it won't perform any of the actions that the optimizer is dependent on. You probably have an output of the network called logits (you could call it anything, but logits is the most common and seen in most tutorials). You might also have defined an op called accuracy which computes the accuracy of the batch. You can get the value of those with a similar request to tensorflow:
sess.run([logits, accuracy], feed_dict={x:data, y:labels})
Almost any tutorial will demonstrate this. My favorite tutorials are here: https://github.com/aymericdamien/TensorFlow-Examples
I'm building a CNN model using Tensorflow, without the use of any frontend APIs such as Keras. I'm creating a VGG-16 model and using the pre-trained weights, and want to fine tune the last layers to serve my purpose.
Following the tutorial here, http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
I re-created the training script and modified as per my requirements. However, my training does not happen and the training accuracy is stuck at 50.00% and validation accuracy is forming a pattern repeating the numbers.
Attached is the screenshot of the same.
I have been stuck on this for days now and can't seem to find the error. Any help is appreciated.
The code is pretty long and hence here is the gist file for the same
Your cross entropy is wrong, you are comparing your logits with the softmax of your logits.
This:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_pred)
Should be:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
Some things to note. I would not train on some data point and then evaluate on the same datapoint. Your training accuracy is probably going to be biased by doing so. Another point to note ist that tf.argmax(tf.softmax(logits)) is the same as tf.argmax(logits).
I've been looking through the TensorFlow FullyConnected tutorial. This also uses the helper code mnist.py
I understand the code but for one nagging piece. After training the Neural Net, the weights obtained from training should be used to evaluate the precision of the model on the Validation (and Test) data. However, I don't see that being done anywhere.
Infact, this is the only thing I see in fully_connected_feed.py
# Evaluate against the validation set.
print('Validation Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.validation)
# Evaluate against the test set.
print('Test Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.test)
the do_eval() function seems to be passed a parameter eval_correct which seems to be recalculating the logits again on this new data. I've been playing around with TF for a while now but I'm baffled by this code. Any thoughts would be great.
TensorFlow creates a graph with the weights and biases. Roughly speaking while you train this neural net the weights and biases get changed so it produces expected outputs. The line 131 in fully_connected_feed.py (with tf.Graph().as_default():) is used to tell TensorFlow to use the default graph. Therefore every line in the training loop including the calls of the do_eval() function use the default graph. Since the weights obtained from training are not resetted before evaluation they are used for it.
eval_correct is the operation used instead of the training operation to just evaluate the neural net without training it. This is important because otherwise the neural net would be trained to them which would result in distorted (too good) results.