Nonexistant pytorch gradients when dotting tensors in loss function - python

For the purposes of this MWE I'm trying to fit a linear regression using a custom loss function with multiple terms. However, I'm running into strange behavior when trying to weight the different terms in my loss function by dotting a weight vector with my losses. Just summing the losses works as expected; however, when dotting the weights and losses the backpropagation gets broken somehow and the loss function doesn't decrease.
I've tried enabling and disabling requires_grad on both tensors, but have been unable to replicate the expected behavior.
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# Hyper-parameters
input_size = 1
output_size = 1
num_epochs = 60
learning_rate = 0.001
# Toy dataset
x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
[9.779], [6.182], [7.59], [2.167], [7.042],
[10.791], [5.313], [7.997], [3.1]], dtype=np.float32)
y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
[3.366], [2.596], [2.53], [1.221], [2.827],
[3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
# Linear regression model
model = nn.Linear(input_size, output_size)
# Loss and optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def loss_fn(outputs, targets):
l1loss = torch.norm(outputs - targets, 1)
l2loss = torch.norm(outputs - targets, 2)
# This works as expected
# loss = 1 * l1loss + 1 * l2loss
# Loss never changes, no matter what combination of
# requires_grad I set
loss = torch.dot(torch.tensor([1.0, 1.0], requires_grad=False),
torch.tensor([l1loss, l2loss], requires_grad=True))
return loss
# Train the model
for epoch in range(num_epochs):
# Convert numpy arrays to torch tensors
inputs = torch.from_numpy(x_train)
targets = torch.from_numpy(y_train)
# Forward pass
outputs = model(inputs)
loss = loss_fn(outputs, targets)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 5 == 0:
print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# Plot the graph
predicted = model(torch.from_numpy(x_train)).detach().numpy()
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.plot(x_train, predicted, label='Fitted line')
plt.legend()
plt.show()
Expected result: loss function decreases and the linear regression is fitted (see output below)
Epoch [5/60], Loss: 7.9943
Epoch [10/60], Loss: 7.7597
Epoch [15/60], Loss: 7.6619
Epoch [20/60], Loss: 7.6102
Epoch [25/60], Loss: 7.4971
Epoch [30/60], Loss: 7.4106
Epoch [35/60], Loss: 7.3942
Epoch [40/60], Loss: 7.2438
Epoch [45/60], Loss: 7.2322
Epoch [50/60], Loss: 7.1012
Epoch [55/60], Loss: 7.0701
Epoch [60/60], Loss: 6.9612
Actual result: no change in loss function
Epoch [5/60], Loss: 73.7473
Epoch [10/60], Loss: 73.7473
Epoch [15/60], Loss: 73.7473
Epoch [20/60], Loss: 73.7473
Epoch [25/60], Loss: 73.7473
Epoch [30/60], Loss: 73.7473
Epoch [35/60], Loss: 73.7473
Epoch [40/60], Loss: 73.7473
Epoch [45/60], Loss: 73.7473
Epoch [50/60], Loss: 73.7473
Epoch [55/60], Loss: 73.7473
Epoch [60/60], Loss: 73.7473
I'm pretty confused as to why such a simple operation is breaking the backpropagation gradients and would really appreciate it if anyone had some insights on why this isn't working.

Use torch.cat((loss1, loss2)), you are creating new Tensor from existing tensors destroying graph.
Anyway you shouldn't do that unless you are trying to generalize your loss function, it's pretty unreadable. Simple addition is way better.

Related

How to see the loss of the best epoch from early stopping in Keras?

I have managed to implement early stopping into my Keras model, but I am not sure how I can view the loss of the best epoch.
es = EarlyStopping(monitor='val_out_soft_loss',
mode='min',
restore_best_weights=True,
verbose=2,
patience=10)
model.fit(tr_x,
tr_y,
batch_size=batch_size,
epochs=epochs,
verbose=1,
callbacks=[es],
validation_data=(val_x, val_y))
loss = model.history.history["val_out_soft_loss"][-1]
return model, loss
The way I have defined the loss score, means that the returned score comes from the final epoch, not the best epoch.
Example:
from sklearn.model_selection import train_test_split, KFold
losses = []
models = []
for k in range(2):
kfold = KFold(5, random_state = 42 + k, shuffle = True)
for k_fold, (tr_inds, val_inds) in enumerate(kfold.split(train_y)):
print("-----------")
print("-----------")
model, loss = get_model(64, 100)
models.append(model)
print(k_fold, loss)
losses.append(loss)
print("-------")
print(losses)
print(np.mean(losses))
Epoch 23/100
18536/18536 [==============================] - 7s 362us/step - loss: 0.0116 - out_soft_loss: 0.0112 - out_reg_loss: 0.0393 - val_loss: 0.0131 - val_out_soft_loss: 0.0127 - val_out_reg_loss: 0.0381
Epoch 24/100
18536/18536 [==============================] - 7s 356us/step - loss: 0.0116 - out_soft_loss: 0.0112 - out_reg_loss: 0.0388 - val_loss: 0.0132 - val_out_soft_loss: 0.0127 - val_out_reg_loss: 0.0403
Restoring model weights from the end of the best epoch
Epoch 00024: early stopping
0 0.012735568918287754
So in this example, I would like to see the loss at Epoch 00014 (which is 0.0124).
I also have a separate question: How can I set the decimal places for the val_out_soft_loss score?
Assign the fit() call in Keras to a variable so you can track the metrics through the epochs.
history = model.fit(tr_x, ...
It will return a dictionary, access it like this:
loss_hist = history.history['loss']
And then get the min() to get the minimum loss, and argmin() to get the best epoch (zero-based).
np.min(loss_hist)
np.argmin(loss_hist)

PyTorch: Different training accuracies using same random seed

I am trying to evaluate my model on the whole training set after each epoch.
This is what I did:
torch.manual_seed(1)
model = ConvNet(num_classes=num_classes)
cost_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def compute_accuracy(model, data_loader):
correct_pred, num_examples = 0, 0
for features, targets in data_loader:
logits = model(features)
predicted_labels = torch.argmax(logits, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
the output was convincing:
Epoch: 001/005 training accuracy: 89.08%
Epoch: 002/005 training accuracy: 90.41%
Epoch: 003/005 training accuracy: 91.70%
Epoch: 004/005 training accuracy: 92.31%
Epoch: 005/005 training accuracy: 92.95%
But then I added another line at the end of the training loop, to also evaluate the model on the whole test set after each epoch:
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
print('\t\t testing accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
But the training accuracies started to change:
Epoch: 001/005 training accuracy: 89.08%
testing accuracy: 87.66%
Epoch: 002/005 training accuracy: 90.42%
testing accuracy: 89.04%
Epoch: 003/005 training accuracy: 91.84%
testing accuracy: 90.01%
Epoch: 004/005 training accuracy: 91.86%
testing accuracy: 89.83%
Epoch: 005/005 training accuracy: 92.45%
testing accuracy: 90.32%
Am I doing something wrong? I expected the training accuracies to remain the same because the manual seed is 1 in both cases.
Is this an expected output ?
The random seed had been set wasn't stop the model for learning to get higher accuracy becuase the random seed is a number for Pseudo random. In this case, you had told the model to shuffle the training data with a random number("1").

How is the keras accuracy showed in progress bar calculated? From which inputs is it calculated? How to replicate it?

I am trying to understand what is the accuracy "acc" shown in the keras progress bar at the end of epoch:
13/13 [==============================] - 0s 76us/step - loss: 0.7100 - acc: 0.4615
At the end of an epoch it should be the accuracy of the model predictions of all training samples. However when the model is evaluated on the same training samples, the actual accuracy can be very different.
Below is adapted example of MLP for binary classification from keras webpage. A simple sequential neural net is doing binary classification of randomly generated numbers. The batch size is the same as the number of training examples (13), so that every epoch contain only one step. Since loss is set to binary_crossentropy, for the accuracy calculation is used binary_accuracy defined in metrics.py. MyEval class defines callback, which is called at the end of each epoch. It uses two ways of calculating the accuracy of the training data a) model evaluate and b) model predict to get prediction and then almost the same code as is used in keras binary_accuracy function. These two accuracies are consistent, but most of the time are different to the one in the progress bar. Why they are different? Is is possible to calculate the same accuracy as is in the progress bar? Or have I made a mistake in my assumptions?
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import callbacks
np.random.seed(1) # fix random seed for reproducibility
# Generate dummy data
x_train = np.random.random((13, 20))
y_train = np.random.randint(2, size=(13, 1))
model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
class MyEval(callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
my_accuracy_1 = self.model.evaluate(x_train, y_train, verbose=0)[1]
y_pred = self.model.predict(x_train)
my_accuracy_2 = np.mean(np.equal(y_train, np.round(y_pred)))
print("my accuracy 1: {}".format(my_accuracy_1))
print("my accuracy 2: {}".format(my_accuracy_2))
my_eval = MyEval()
model.fit(x_train, y_train,
epochs=5,
batch_size=13,
callbacks=[my_eval],
shuffle=False)
The output of the above code:
13/13 [==============================] - 0s 25ms/step - loss: 0.7303 - acc: 0.5385
my accuracy 1: 0.5384615659713745
my accuracy 2: 0.5384615384615384
Epoch 2/5
13/13 [==============================] - 0s 95us/step - loss: 0.7412 - acc: 0.4615
my accuracy 1: 0.9230769276618958
my accuracy 2: 0.9230769230769231
Epoch 3/5
13/13 [==============================] - 0s 77us/step - loss: 0.7324 - acc: 0.3846
my accuracy 1: 0.9230769276618958
my accuracy 2: 0.9230769230769231
Epoch 4/5
13/13 [==============================] - 0s 72us/step - loss: 0.6543 - acc: 0.5385
my accuracy 1: 0.9230769276618958
my accuracy 2: 0.9230769230769231
Epoch 5/5
13/13 [==============================] - 0s 76us/step - loss: 0.6459 - acc: 0.6923
my accuracy 1: 0.8461538553237915
my accuracy 2: 0.8461538461538461
using: Python 3.5.2, tensorflow-gpu==1.14.0 Keras==2.2.4 numpy==1.15.2
I think it has to do with the usage of Dropout. Dropout is only enabled during training, but not during evaluation or prediction. Hence the discrepancy of the accuracies during training and evaluation/prediction.
Moreover, the training accuracy that is displayed in the bar shows the averaged accuracy over the training epoch, averaged over the batch accuracies calculated after each batch. Keep in mind that the model parameters are tuned after each batch, such that the accuracy shown in the bar at the end does not exactly match the accuracy of a valication after the epoch is finished (because the training accuracy is calculated with different model parameters per batch, and the validation accuracy is calculated with the same parameters for all batches).
This is your example, with more data (therefore more than one epoch), and without dropout:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import callbacks
np.random.seed(1) # fix random seed for reproducibility
# Generate dummy data
x_train = np.random.random((200, 20))
y_train = np.random.randint(2, size=(200, 1))
model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
class MyEval(callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
my_accuracy_1 = self.model.evaluate(x_train, y_train, verbose=0)[1]
y_pred = self.model.predict(x_train)
my_accuracy_2 = np.mean(np.equal(y_train, np.round(y_pred)))
print("my accuracy 1 after epoch {}: {}".format(epoch + 1,my_accuracy_1))
print("my accuracy 2 after epoch {}: {}".format(epoch + 1,my_accuracy_2))
my_eval = MyEval()
model.fit(x_train, y_train,
epochs=5,
batch_size=13,
callbacks=[my_eval],
shuffle=False)
The output reads:
Train on 200 samples
Epoch 1/5
my accuracy 1 after epoch 1: 0.5450000166893005
my accuracy 2 after epoch 1: 0.545
200/200 [==============================] - 0s 2ms/sample - loss: 0.6978 - accuracy: 0.5350
Epoch 2/5
my accuracy 1 after epoch 2: 0.5600000023841858
my accuracy 2 after epoch 2: 0.56
200/200 [==============================] - 0s 383us/sample - loss: 0.6892 - accuracy: 0.5550
Epoch 3/5
my accuracy 1 after epoch 3: 0.5799999833106995
my accuracy 2 after epoch 3: 0.58
200/200 [==============================] - 0s 496us/sample - loss: 0.6844 - accuracy: 0.5800
Epoch 4/5
my accuracy 1 after epoch 4: 0.6000000238418579
my accuracy 2 after epoch 4: 0.6
200/200 [==============================] - 0s 364us/sample - loss: 0.6801 - accuracy: 0.6150
Epoch 5/5
my accuracy 1 after epoch 5: 0.6050000190734863
my accuracy 2 after epoch 5: 0.605
200/200 [==============================] - 0s 393us/sample - loss: 0.6756 - accuracy: 0.6200
The validation accuracy after the epoch pretty much resembles the averaged training accuracy at the end of the epoch now.

Metrics not displaying when running model.fit

I am working my way through an ML example in Google Colabs. The documentation says that when I run model.fit, the loss and accuracy metrics are displayed. I am not seeing any loss or accuracy metric.
I have added accuracy as a metric in model.compile
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Here is a screenshot of what I am seeing.
How do I get the loss and accuracy metrics to be displayed when I am fitting the model?
You can use the verbose flag and set it to 2 to display 1 line per epoch or 1 for a progress bar.
import keras
import numpy as np
model = keras.Sequential()
model.add(keras.layers.Dense(10, input_shape=(5, 6)))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy')
x_data = np.random.random((32, 5, 6))
y_data = np.random.randint(0, 9, size=(32,5,1))
model.fit(x=x_data, y=y_data, batch_size=16, epochs=3)
Use tf.cast instead.
Epoch 1/3
32/32 [==============================] - 1s 20ms/step - loss: 9.9664
Epoch 2/3
32/32 [==============================] - 0s 293us/step - loss: 9.9537
Epoch 3/3
32/32 [==============================] - 0s 164us/step - loss: 9.9425
I hope it solves your problem.

Training loss doesn't match validation loss when X_train = X_test

I am running:
D.fit(X_train, y_train, nb_epoch=12,validation_data=(X_train,y_train))
But I get outputs like:
Train on 61936 samples, validate on 61936 samples
Epoch 1/12
61936/61936 [==============================] - 10s 166us/step - loss: 0.0021 - val_loss: 1.5650e-04
Epoch 2/12
61936/61936 [==============================] - 10s 165us/step - loss: 0.0014 - val_loss: 6.6482e-04
...
Epoch 10/12
61936/61936 [==============================] - 11s 170us/step - loss: 0.0104 - val_loss: 9.6666e-05
known issue
https://github.com/keras-team/keras/issues/605
The other reason that the results are different is because the model
is being trained while the "loss" is being computed, whereas the model
is fixed while "val_loss" is being computed. Since the model is
training, "loss" is typically going to be larger than the true
training set loss at the end of the epoch. I.e. "loss" is the average
loss during the epoch, and "val_loss" is the average loss after the
end of the epoch. Since the model changes during the epoch, the loss
changes.
These will never match. Validation loss is computed on the whole dataset at once (with weights fixed), with training loss is the average of loss across batches (weights change after every batch). If you want the real loss on the training set, you should run model.evaluate(X_train)

Categories

Resources