PyTorch: Different training accuracies using same random seed - python

I am trying to evaluate my model on the whole training set after each epoch.
This is what I did:
torch.manual_seed(1)
model = ConvNet(num_classes=num_classes)
cost_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def compute_accuracy(model, data_loader):
correct_pred, num_examples = 0, 0
for features, targets in data_loader:
logits = model(features)
predicted_labels = torch.argmax(logits, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
the output was convincing:
Epoch: 001/005 training accuracy: 89.08%
Epoch: 002/005 training accuracy: 90.41%
Epoch: 003/005 training accuracy: 91.70%
Epoch: 004/005 training accuracy: 92.31%
Epoch: 005/005 training accuracy: 92.95%
But then I added another line at the end of the training loop, to also evaluate the model on the whole test set after each epoch:
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
print('\t\t testing accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
But the training accuracies started to change:
Epoch: 001/005 training accuracy: 89.08%
testing accuracy: 87.66%
Epoch: 002/005 training accuracy: 90.42%
testing accuracy: 89.04%
Epoch: 003/005 training accuracy: 91.84%
testing accuracy: 90.01%
Epoch: 004/005 training accuracy: 91.86%
testing accuracy: 89.83%
Epoch: 005/005 training accuracy: 92.45%
testing accuracy: 90.32%
Am I doing something wrong? I expected the training accuracies to remain the same because the manual seed is 1 in both cases.
Is this an expected output ?

The random seed had been set wasn't stop the model for learning to get higher accuracy becuase the random seed is a number for Pseudo random. In this case, you had told the model to shuffle the training data with a random number("1").

Related

Simulate streaming learning using Tensorflow's fit() and evaluate() built-in methods

What I'm trying to achieve is to simulate a streaming learning method using Tensorflow's fit() and evaluate() methods.
What I have until now is a script like this, after getting some help from the community here:
import pandas as pd
import tensorflow as tf
df = pd.read_csv('labeled_tweets_processed.csv')
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 1
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
to setup the model and training the model using this command:
history = model.fit(train_dataset, epochs=1)
What I actually want to do is to simulate a Streaming environment where I have a pipeline like Predict -> Fit into the model.
I thought it could be accomplished by using a method like:
for x, y in enumerate(train_dataset):
test_loss, test_acc = model.evaluate([x, y])
model.fit(y)
but it doesn't seems to work right like this.
What is the right way to simulate the described environment?
What is the best way to iterate through dataset's each entry and input to the desired methods?
Thank you very much in advance!
Update 1:
What I have right now, but resulting in very low model accuracy. Not sure if the metrics are updated the right way.
for idx, (x, y) in enumerate(train_dataset):
pred = model.predict_on_batch(x)
print(model.test_on_batch(x, pred, reset_metrics=False, return_dict=True))
model.train_on_batch(x, y, reset_metrics=False)
print(f"After {idx} entries")
You can try something like this:
for idx, (x, y) in enumerate(train_dataset):
test_loss, test_acc = model.evaluate(x, y)
model.fit(x, y, epochs=1)
Update 1:
Maybe try using a custom training loop:
import pandas as pd
import tensorflow as tf
df = pd.DataFrame(data = {'texts': ['Some text ssss', 'Some text', 'Some text', 'Some text', 'Some text'],
'class': [0, 0, 1, 1, 1]})
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
VOCAB_SIZE = 1000
encoder = tf.keras.layers.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(dataset.map(lambda text, label: text))
BUFFER_SIZE = 2
BATCH_SIZE = 3
train_dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
opt = tf.keras.optimizers.Adam(1e-4)
loss_fn = tf.keras.losses.BinaryCrossentropy()
train_acc_metric = tf.keras.metrics.BinaryAccuracy()
test_acc_metric = tf.keras.metrics.BinaryAccuracy()
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch + 1,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
pred = model(x_batch_train)
test_acc_metric.update_state(y_batch_train, pred)
print("Current test acc: %.4f" % (float(test_acc_metric.result()),))
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
opt.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y_batch_train, logits)
print("Current train acc: %.4f" % (float(train_acc_metric.result()),))
test_acc = test_acc_metric.result()
print("Total test acc over epoch: %.4f" % (float(test_acc),))
test_acc_metric.reset_states()
train_acc = train_acc_metric.result()
print("Total train acc over epoch: %.4f" % (float(train_acc),))
train_acc_metric.reset_states()
Start of epoch 1
Current test acc: 0.6922
Current train acc: 0.6922
Current test acc: 0.6936
Current train acc: 0.6936
Current test acc: 0.6928
Current train acc: 0.6928
Current test acc: 0.6934
Current train acc: 0.6934
Current test acc: 0.6938
Current train acc: 0.6938
Total test acc over epoch: 0.6938
Total train acc over epoch: 0.6938
Start of epoch 2
Current test acc: 0.6914
Current train acc: 0.6914
Current test acc: 0.6914
Current train acc: 0.6914
Current test acc: 0.6926
Current train acc: 0.6926
Current test acc: 0.6932
Current train acc: 0.6932
Current test acc: 0.6936
Current train acc: 0.6936
Total test acc over epoch: 0.6936
Total train acc over epoch: 0.6936

How to find training accuracy in pytorch

def train_and_test(e):
epochs = e
train_losses, test_losses, val_acc, train_acc= [], [], [], []
valid_loss_min = np.Inf
model.train()
print("Model Training started.....")
for epoch in range(epochs):
running_loss = 0
batch = 0
for images, labels in trainloader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
batch += 1
if batch % 10 == 0:
print(f" epoch {epoch + 1} batch {batch} completed")
test_loss = 0
accuracy = 0
with torch.no_grad():
print(f"validation started for {epoch + 1}")
model.eval()
for images, labels in validloader:
images, labels = images.to(device), labels.to(device)
logps = model(images)
test_loss += criterion(logps, labels)
ps = torch.exp(logps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
train_losses.append(running_loss / len(trainloader))
test_losses.append(test_loss / len(validloader))
val_acc.append(accuracy / len(validloader))
training_acc.append(running_loss / len(trainloader))
scheduler.step()
print("Epoch: {}/{}.. ".format(epoch + 1, epochs),"Training Loss: {:.3f}.. ".format(train_losses[-1]), "Valid Loss: {:.3f}.. ".format(test_losses[-1]),
"Valid Accuracy: {:.3f}".format(accuracy / len(validloader)), "train Accuracy: {:.3f}".format(running_loss / len(trainloader)))
model.train()
if test_loss / len(validloader) <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min, test_loss / len(validloader)))
torch.save({
'epoch': epoch,
'model': model,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': valid_loss_min
}, path)
valid_loss_min = test_loss / len(validloader)
print('Training Completed Succesfully !')
return train_losses, test_losses, val_acc ,train_acc
my output is
Model Training started.....
epoch 1 batch 10 completed
epoch 1 batch 20 completed
epoch 1 batch 30 completed
epoch 1 batch 40 completed
validation started for 1
Epoch: 1/2.. Training Loss: 0.088.. Valid Loss: 0.072.. Valid Accuracy: 0.979 train Accuracy: 0.088
Validation loss decreased (inf --> 0.072044). Saving model ...
I am using dataset that is multi-set classification and getting training accuracy and training loss equal so I think there is error in training accuracy code.
training_acc.append(running_loss / len(trainloader))
"train Accuracy: {:.3f}".format(running_loss / len(trainloader))
training_acc.append(accuracy / len(trainloader))
"train Accuracy: {:.3f}".format(accuracy / len(trainloader))
is also not working fine
this method should be followed to plot training loses as well as accuracy
for images , labels in trainloader:
#start = time.time()
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()# Clear the gradients, do this because gradients are accumulated as 0 in each epoch
# Forward pass - compute outputs on input data using the model
outputs = model(images) # modeling for each image batch
loss = criterion(outputs,labels) # calculating the loss
# the backward pass
loss.backward() # This is where the model learns by backpropagating
optimizer.step() # And optimizes its weights here - Update the parameters
running_loss += loss.item()
# as Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(outputs)
top_p , top_class = ps.topk(1,dim=1)
equals = top_class == labels.view(*top_class.shape)
# Convert correct_counts to float and then compute the mean
acc += torch.mean(equals.type(torch.FloatTensor))

Different result inside train function and outside it

I am playing with tensorflow 2. I did my own model similar to how it is done here.
Then I created my own fit function. Now I get the weirdest thing ever. Here is the EXACT copy/paste output from my notebook where I did the tests:
def fit(x_train, y_train, learning_rate=0.01, epochs=10, batch_size=100, normal=True, verbose=True, display_freq=100):
if normal:
x_train = normalize(x_train) # TODO: This normalize could be a bit different for each and be bad.
num_tr_iter = int(len(y_train) / batch_size) # Number of training iterations in each epoch
if verbose:
print("Starting training...")
for epoch in range(epochs):
# Randomly shuffle the training data at the beginning of each epoch
x_train, y_train = randomize(x_train, y_train)
for iteration in range(num_tr_iter):
# Get the batch
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, y_batch = get_next_batch(x_train, y_train, start, end)
# Run optimization op (backpropagation)
# import pdb; pdb.set_trace()
if verbose and (epoch * batch_size + iteration) % display_freq == 0:
current_loss = _apply_loss(y_train, model(x_train, training=True))
current_acc = evaluate_accuracy(x_train, y_train)
print("Epoch: {0}/{1}; batch {2}/{3}; loss: {4:.4f}; accuracy: {5:.2f} %"
.format(epoch, epochs, iteration, num_tr_iter, current_loss, current_acc*100))
train_step(x_batch, y_batch, learning_rate)
current_loss = _apply_loss(y_train, model(x_train, training=True))
current_acc = evaluate_accuracy(x_train, y_train)
print("End: loss: {0:.4f}; accuracy: {1:.2f} %".format(current_loss, current_acc*100))
import logging
logging.getLogger('tensorflow').disabled = True
fit(x_train, y_train)
current_loss = _apply_loss(y_train, model(x_train, training=True))
current_acc = evaluate_accuracy(x_train, y_train)
print("End: loss: {0:.4f}; accuracy: {1:.2f} %".format(current_loss, current_acc*100))
This segment outputs:
Starting training...
Epoch: 0/10; batch 0/80; loss: 0.9533; accuracy: 59.67 %
Epoch: 1/10; batch 0/80; loss: 0.9386; accuracy: 60.15 %
Epoch: 2/10; batch 0/80; loss: 0.9259; accuracy: 60.50 %
Epoch: 3/10; batch 0/80; loss: 0.9148; accuracy: 61.05 %
Epoch: 4/10; batch 0/80; loss: 0.9051; accuracy: 61.15 %
Epoch: 5/10; batch 0/80; loss: 0.8968; accuracy: 61.35 %
Epoch: 6/10; batch 0/80; loss: 0.8896; accuracy: 61.27 %
Epoch: 7/10; batch 0/80; loss: 0.8833; accuracy: 61.51 %
Epoch: 8/10; batch 0/80; loss: 0.8780; accuracy: 61.52 %
Epoch: 9/10; batch 0/80; loss: 0.8733; accuracy: 61.54 %
End: loss: 0.8733; accuracy: 61.54 %
End: loss: 0.4671; accuracy: 77.08 %
Now my question is, how is it that I get a different value on the last 2 lines!? I am doing the same thing right? I am totally puzzled here. I don't even know how to google this.
So the problem was just stupid. It was due to the normalize thing I did at the start of the train example! Removed it and started working Ok.

Nonexistant pytorch gradients when dotting tensors in loss function

For the purposes of this MWE I'm trying to fit a linear regression using a custom loss function with multiple terms. However, I'm running into strange behavior when trying to weight the different terms in my loss function by dotting a weight vector with my losses. Just summing the losses works as expected; however, when dotting the weights and losses the backpropagation gets broken somehow and the loss function doesn't decrease.
I've tried enabling and disabling requires_grad on both tensors, but have been unable to replicate the expected behavior.
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# Hyper-parameters
input_size = 1
output_size = 1
num_epochs = 60
learning_rate = 0.001
# Toy dataset
x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
[9.779], [6.182], [7.59], [2.167], [7.042],
[10.791], [5.313], [7.997], [3.1]], dtype=np.float32)
y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
[3.366], [2.596], [2.53], [1.221], [2.827],
[3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
# Linear regression model
model = nn.Linear(input_size, output_size)
# Loss and optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def loss_fn(outputs, targets):
l1loss = torch.norm(outputs - targets, 1)
l2loss = torch.norm(outputs - targets, 2)
# This works as expected
# loss = 1 * l1loss + 1 * l2loss
# Loss never changes, no matter what combination of
# requires_grad I set
loss = torch.dot(torch.tensor([1.0, 1.0], requires_grad=False),
torch.tensor([l1loss, l2loss], requires_grad=True))
return loss
# Train the model
for epoch in range(num_epochs):
# Convert numpy arrays to torch tensors
inputs = torch.from_numpy(x_train)
targets = torch.from_numpy(y_train)
# Forward pass
outputs = model(inputs)
loss = loss_fn(outputs, targets)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 5 == 0:
print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# Plot the graph
predicted = model(torch.from_numpy(x_train)).detach().numpy()
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.plot(x_train, predicted, label='Fitted line')
plt.legend()
plt.show()
Expected result: loss function decreases and the linear regression is fitted (see output below)
Epoch [5/60], Loss: 7.9943
Epoch [10/60], Loss: 7.7597
Epoch [15/60], Loss: 7.6619
Epoch [20/60], Loss: 7.6102
Epoch [25/60], Loss: 7.4971
Epoch [30/60], Loss: 7.4106
Epoch [35/60], Loss: 7.3942
Epoch [40/60], Loss: 7.2438
Epoch [45/60], Loss: 7.2322
Epoch [50/60], Loss: 7.1012
Epoch [55/60], Loss: 7.0701
Epoch [60/60], Loss: 6.9612
Actual result: no change in loss function
Epoch [5/60], Loss: 73.7473
Epoch [10/60], Loss: 73.7473
Epoch [15/60], Loss: 73.7473
Epoch [20/60], Loss: 73.7473
Epoch [25/60], Loss: 73.7473
Epoch [30/60], Loss: 73.7473
Epoch [35/60], Loss: 73.7473
Epoch [40/60], Loss: 73.7473
Epoch [45/60], Loss: 73.7473
Epoch [50/60], Loss: 73.7473
Epoch [55/60], Loss: 73.7473
Epoch [60/60], Loss: 73.7473
I'm pretty confused as to why such a simple operation is breaking the backpropagation gradients and would really appreciate it if anyone had some insights on why this isn't working.
Use torch.cat((loss1, loss2)), you are creating new Tensor from existing tensors destroying graph.
Anyway you shouldn't do that unless you are trying to generalize your loss function, it's pretty unreadable. Simple addition is way better.

Pytorch LSTM each epoch starts from 0 accuracy

I'm training a LSTM model for time series prediction and at each epoch my accuracy restarts from 0 as if I'm training for the first time.
I attach below the training method snippet:
def train(model, loader, epoch, mini_batch_size, sequence_size):
model.train()
correct = 0
padded_size = 0
size_input = mini_batch_size * sequence_size
for batch_idx, (inputs, labels, agreement_score) in enumerate(loader):
if(inputs.size(0) == size_input):
inputs = inputs.clone().reshape(mini_batch_size, sequence_size, inputs.size(1))
labels = labels.clone().squeeze().reshape(mini_batch_size*sequence_size)
agreement_score = agreement_score.clone().squeeze().reshape(mini_batch_size*sequence_size)
else:
padded_size = size_input - inputs.size(0)
(inputs, labels, agreement_score) = padd_incomplete_sequences(inputs, labels, agreement_score, mini_batch_size, sequence_size)
inputs, labels, agreement_score = Variable(inputs.cuda()), Variable(labels.cuda()), Variable(agreement_score.cuda())
output = model(inputs)
loss = criterion(output, labels)
loss = loss * agreement_score
loss = loss.mean()
optimizer.zero_grad()
loss.backward()
optimizer.step()
pred = output.data.max(1, keepdim = True)[1]
correct += pred.eq(labels.data.view_as(pred)).cuda().sum()
accuracy = 100. * correct / (len(loader.dataset) + padded_size)
print("Train: Epoch: {}, [{}/{} ({:.0f}%)]\t loss: {:.6f}, Accuracy: {}/{} ({:.0f}%)".format(
epoch,
batch_idx * len(output),
(len(loader.dataset) + padded_size),
100. * batch_idx / (len(loader.dataset)+padded_size),
loss.item(),
correct,
(len(loader.dataset) + padded_size),
accuracy))
accuracy = 100. * correct / (len(loader.dataset) + padded_size)
train_accuracy.append(accuracy)
train_epochs.append(epoch)
train_loss.append(loss.item())
According to that my loop looks like:
for epoch in range(1, 10):
train(audio_lstm_model, train_rnn_audio_loader, epoch, MINI_BATCH_SIZE, SEQUENCE_SIZE_AUDIO)
evaluation(audio_lstm_model,validation_rnn_audio_loader, epoch, MINI_BATCH_SIZE, SEQUENCE_SIZE_AUDIO)
Consequently, my accuracy and loss restarts at every epoch:
Train: Epoch: 1, [0/1039079 (0%)] loss: 0.921637, Accuracy: 0/1039079 (0%)
...
Train: Epoch: 1, [10368/1039079 (0%)] loss: 0.523242, Accuracy: 206010/1039079 (19%)
Test set: loss: 151.4845, Accuracy: 88222/523315 (16%)
Train: Epoch: 2, [0/1039079 (0%)] loss: 0.921497, Accuracy: 0/1039079 (0%)
If anyone has any clue about it, your help is welcomed!
Have a nice day!
The problem turn out to be the fact that the sequence size was too small for the network in order to be able to make some predictions from it.
So after increasing the sequence length by some orders of magnitude, I was able to improve my model after each epoch.

Categories

Resources