Evaluate the model during training affects its performance PyTorch

Evaluate the model during training affects its performance PyTorch - python

In PyTorch, I want to evaluate my model on the validation set every eval_step during training, and I wrote code like this:
def tune(model, loader_train, loader_dev, optimizer, epochs, eval_step):
for epoch in range(epochs):
for step,x in enumerate(loader_train):
optimizer.zero_grad()
loss = model(x)
loss.backward()
optimizer.step()
if step % eval_step == 0:
model.eval()
test(model, loader_dev)
model.train()
When eval_step = int(len(loader_train)/2) and eval_step = int(len(loader_train)/8), they lead to quite different metric result after training through one whole epoch (which means the second output for the former differs the eighth output for the latter).
Could anyone explain why?
The length of loader_train is 20000 (it depends on batch size), and here is my testing script:
def test(model, loader_dev):
preds = []
labels = []
for step,x in enumerate(loader_dev):
preds.append(model(x).view(-1))
labels.apend(x['label'].view(-1))
metric = cal_metric(preds, labels)
logger.info(metric)

I think you probably set 'shffule=True' in your dataloader. Even though you fix 'random seed', dataloader in torch will generate different results if you use another dataloader while using current dataloader. In the scenario you describe, it may cause your model get data input in different order and then result in different metric result.

Related

BERT Classifier ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2]))

I'm training a Classifier Model but it's a few days that I cannot overcame a problem!
I have the ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2])) error but actually it seems correct to me ! Indeed I used unsqueeze(1) to put them of the same size. WHat else I can try? Thank you!
class SequenceClassifier(nn.Module):
def __init__(self, n_classes):
super(SequenceClassifier, self).__init__()
self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME,return_dict=False)
self.drop = nn.Dropout(p=0.3)
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
def forward(self, input_ids, attention_mask):
_, pooled_output = self.bert(
input_ids=input_ids,
attention_mask=attention_mask
)
output = self.drop(pooled_output)
return self.out(output)
model = SequenceClassifier(len(class_names))
model = model.to(device)
EPOCHS = 10
optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)
total_steps = len(train_data_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=total_steps
)
weights=[0.5,1]
pos_weight=torch.FloatTensor(weights).to(device)
loss_fn=nn.BCEWithLogitsLoss(pos_weight=pos_weight)
def train_epoch(
model,
data_loader,
loss_fn,
optimizer,
device,
scheduler,
n_examples
):
model = model.train()
losses = []
correct_predictions = 0
for d in data_loader:
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
targets = targets.unsqueeze(1)
loss = loss_fn(outputs, targets)
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
return correct_predictions.double() / n_examples, np.mean(losses)
%%time
history = defaultdict(list)
best_accuracy = 0
for epoch in range(EPOCHS):
print(f'Epoch {epoch + 1}/{EPOCHS}')
print('-' * 10)
train_acc, train_loss = train_epoch(
model,
train_data_loader,
loss_fn,
optimizer,
device,
scheduler,
len(df_train)
)
print(f'Train loss {train_loss} accuracy {train_acc}')
val_acc, val_loss = eval_model(
model,
val_data_loader,
loss_fn,
device,
len(df_val)
)
print(f'Val loss {val_loss} accuracy {val_acc}')
print()
history['train_acc'].append(train_acc)
history['train_loss'].append(train_loss)
history['val_acc'].append(val_acc)
history['val_loss'].append(val_loss)
if val_acc > best_accuracy:
torch.save(model.state_dict(), 'best_model_state.bin')
best_accuracy = val_acc
ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2]))
EDIT
I have a binary classifier problem, indeed i have 2 classes encoded 0 ("bad") and 1 ("good").

In case anyone stumbles on this like I did, I'll write out an answer since there aren't a lot of google hits for this target size/input size error and the previous answer has some factual inaccuracies.
Unlike the previous answer would suggest, the real problem isn't with the loss function but with the output of the model.nn.BCEWithLogitsLoss is completely fine for multi-label and multi-class applications. Chiara updated her post saying that in fact she has a binary classification problem, but even that should not be a problem for this loss function. So why the error?
The original code has:
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
This means "Run the model, then create preds with the row indeces of the highest output of the model". Obviously, there is only a "index of highest" if there are multiple predicted values. Multiple output values usually means multiple input classes, so I can see why Shai though this was multi-class. But why would we get multiple outputs from a binary classifier?
As it turns out, BERT (or Huggingface anyway) for binary problems expects that n_classes is set to 2 -- setting classes to 1 puts the model in regression mode. This means that under the hood, binary problems are treated like a two-class problem, outputting predictions with the size [2, batch size] -- one column predicting the chance of it being a 1 and one for the chance of it being 0. The loss fucntion throws an error because it is supplied with only one row of one-hot encoded labels: targets = d["targets"].to(device) so the labels have dimensions [batch size] or after the unsqueeze, [1, batch size]. Either way, the dimensions don't match up.
Some loss functions can deal with this fine, but others require the exact same dimensions. To make things more frustrating, for version 1.10, nn.BCEWithLogitsLoss requires matching dimensions but later versions do not.
One solution may therefore be to update your pytorch (version 1.11 would work for example).
For me, this was not an option, so I ended up going with a different loss function. nn.CrossEntropyLoss, as suggested by Shai, indeed does the trick as it accepts any input with the same length. In other words, they had a working solution for the wrong reasons.

You are using nn.BCEWithLogitsLoss loss function. This loss function works for binary classification tasks, and expects the predictions and the targets to be of the same shape (and float data type).
This is in contrast to the multi-class CE loss function nn.CrossEntropyLoss that expects the targets to be integers pointing to the proper place in the predicted class probabilities.
Please read carefully the doc of the functions you are using and make sure you are using them correctly.

Cannot improve model accuracy

I am building a general-purpose NN that would classify images (Dog/No Dog) and movie reviews(Good/Bad). I have to stick to a very specific architecture and loss function so changing these two seems out of the equation. My architecture is a two-layer network with relu followed by a sigmoid and a cross-entropy loss function. With 1000 epochs and a learning rate of around .001 I am getting 100 percent training accuracy and .72 testing accuracy.I was looking for suggestions to improve my testing accuracy.This is the layout of what I have:
def train_net(epochs,batch_size,train_x,train_y,model_size,lr):
n_x,n_h,n_y=model_size
model = Net(n_x, n_h, n_y)
optim = torch.optim.Adam(model.parameters(),lr=0.005)
loss_function = nn.BCELoss()
train_losses = []
accuracy = []
for epoch in range(epochs):
count=0
model.train()
train_loss = []
batch_accuracy = []
for idx in range(0, train_x.shape[0], batch_size):
batch_x = torch.from_numpy(train_x[idx : idx + batch_size]).float()
batch_y = torch.from_numpy(train_y[:,idx : idx + batch_size]).float()
model_output = model(batch_x)
batch_accuracy=[]
loss = loss_function(model_output, batch_y)
train_loss.append(loss.item())
preds = model_output > 0.5
nb_correct = (preds == batch_y).sum()
count+=nb_correct.item()
optim.zero_grad()
loss.backward()
# Scheduler made it worse
# scheduler.step(loss.item())
optim.step()
if epoch % 100 == 1:
train_losses.append(train_loss)
print("Iteration : {}, Training loss: {} ,Accuracy %: {}".format(epoch,np.mean(train_loss),(count/train_x.shape[0])*100))
plt.plot(np.squeeze(train_losses))
plt.ylabel('loss')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(lr))
plt.show()
return model
My model parameters:
batch_size = 32
lr = 0.0001
epochs = 1500
n_x = 12288 # num_px * num_px * 3
n_h = 7
n_y = 1
model_size=n_x,n_h,n_y
model=train_net(epochs,batch_size,train_x,train_y,model_size,or)
and this is the testing phase.
model.eval() #Setting the model to eval mode, hence making it deterministic.
test_loss = []
count=0;
loss_function = nn.BCELoss()
for idx in range(0, test_x.shape[0], batch_size):
with torch.no_grad():
batch_x = torch.from_numpy(test_x[idx : idx + batch_size]).float()
batch_y = torch.from_numpy(test_y[:,idx : idx + batch_size]).float()
model_output = model(batch_x)
preds = model_output > 0.5
loss = loss_function(model_output, batch_y)
test_loss.append(loss.item())
nb_correct = (preds == batch_y).sum()
count+=nb_correct.item()
print("test loss: {},test accuracy: {}".format(np.mean(test_loss),count/test_x.shape[0]))
Things I have tried:
Messing around with the learning rate, having momentum, using schedulers and changing batch sizes.Of course these were mainly guesses and not based on any valid assumptions.

The issue you're facing is overfitting. With 100% accuracy on the training set, your model is effectively memorizing the training set, then failing to generalize to unseen samples. The good news is this is a very common major challenge!
You need regularization. One method is dropout, whereby on different training epochs a random set of the NN connections are dropped, forcing the network to "learn" alternate pathways and weights, and softening sharp peaks in parameter space. Since you need to keep your architecture and loss function the same, you won't be able to add such an option in (though for completeness, read this article for a description and implementation of dropout in PyTorch).
Given your constraints, you'll want to use something like L2 or L1 weight regularization. This typically shows up in the way of adding an additional term to the cost/loss function, which penalizes large weights. In PyTorch, L2 regularization is implemented via the torch.optim construct, with the option weight_decay. (See documentation: torch.optim, search for 'L2')
For your code, try something like:
def train_net(epochs,batch_size,train_x,train_y,model_size,lr):
...
optim = torch.optim.Adam(model.parameters(),...,weight_decay=0.01)
...

Based on your statement that your training accuracy is 100%, while your testing accuracy is significantly lower at 72%, it seems that you are significantly overfitting your dataset.
In short, this means that your model is training itself too specifically to the training data that you've given it, picking up on quirks that may exist in the training data but which are not inherent to the classification. For example, if the dogs in your training data were all white, the model would eventually learn to associate the color white with dogs, and be hard-pressed to recognize dogs of other colors given to it in the test data set.
There are many avenues to address this issue: a well sourced overview of the subject written in simple terms can be found here.
Without more information on the specific constraints you have around changing the architecture of the neural network, it's tough to say for sure what you will and will not be able to change. However, weight regularization and dropout are often used to great effect (and are described in the above article.) You should also be free to implement early stopping and a weight constraint to the model.
I'll leave it you to find resources on how to implement these specific strategies in pytorch, but this should provide a good jumping off point.

how to evaluate and get accuracy of a Feed forward neural network with pytorch

I started using Pytorch and I'm currently working on a Project where I'm using a simple feed forward neural network for linear regression. The Problem is I didn't find anything in Pytorch that allows me to get the Accuracy of a linear regression Model as in Keras or in SKlearn. in keras it would be simple just by setting metrics=["accuracy"] inside the compile function. I searched in the docs and official website of Pytorch but I didn't find anything. seems that this API doesn't exist in Pytorch. I know that I can observe the loss during training or I can simply get the test loss and based on it I can know whether the loss decreased or not but I want to use that Keras Structure where I get the loss value and also an Accuracy value. the Keras way looks more clear. I also tried to implement an accuracy function using the r2_score from sklearn but it gave me wierd values:
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)
def train(model, optimizer, loss_fn):
def train_step(x, y):
model.train()
optimizer.zero_grad()
out = model(x)
loss = loss_fn(out, y)
loss.backward()
optimizer.step()
return loss.item()
return train_step
def fit(epochs=100):
train_func = train(model, optimizer, criterion)
count, total = 0, 0
loss_list, accuracy_list, iters = [], [], []
for e in range(epochs):
for X, y in train_loader:
loss = train_func(X, y)
count += 1
total += len(y)
if count % 50 == 0:
print("loss= ", loss)
loss_list.append(loss)
iters.append(total)
if count % 100 == 0:
model.eval() # im not sure if we can do this in pytorch. I mean evaluating the model while training! it would be great if you tell me whether this is ok or not
out = model(X)
out = out.detach().numpy()
y = y.detach().numpy()
accuracy = r2_score(y, out) # r2_score is the scikit learn r2 score function.
print("accuracy = ", accuracy) # here i get wierd values and it doesn't get better over time, in contrast the loss decreased over time
accuracy_list.append(accuracy)
return iters, loss_list, accuracy_list
I know how to implement an Accuracy function in case of Classification Problem because it is using discrete values. that is clear to me because the implementation is easy and clear. I must only look which correct prediction did the model made and then calculate accuracy. but in this Case I have continuous values so that's why I couldn't implement the function myself and it surprised me that Pytorch don't have a built in function for this. so could someone maybe tell me how to implement this or where to find an Implementation of it?
another thing is where to use the evaluation and where to set the model in evaluation mode by calling the eval function. should I use it during training like I did in my Code or should I train and then test after training and if I test during training should I call the eval function as I did there or it will affect the training when the loop goes back to training mode? another thing I didn't find it also in Pytorch which is Cross validation. how should I implement it in pytorch if there is no API for it like in Keras?

Accuracy does not exist in regression problems.
A similar measure of "accuracy" for a regression problem might be the R-squared score.
If you are using pytorch to train your neural networks, have a look at the package
torchmetrics. You may find what you need there.

correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Look here for more info: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

Training and testing CNN with pytorch. With and without model.eval()

I have two questions:-
I am trying to train a convolution neural network initialized with some pre trained weights (Netwrok contains batch normalization layers as well) (taking reference from here). Before training I want to calculate a validation error using loss_fn = torch.nn.MSELoss().cuda().
And in the reference, the author is using model.eval() before calculating the validation error. But with that result, the CNN model is off from what it should be however when I comment out model.eval(), the output is good (what it should be with pre-trained weights). What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
While calculating the validation error with pre-trained weights and above mentioned loss function what should be the batch size. Shouldn't it be 1 as i want output on each of my input, calculate error with ground truth and in the end take average of all results. If i use higher batch size error is increased. So question is can i use higher batch size if yes what should be the right way. In given code i have given err = float(loss_local) / num_samples but i observed without averaging i.e err = float(loss_local). Error is different for different batch size. I am doing this without model.eval right now.
batch_size = 1
data_path = 'path_to_data'
dtype = torch.FloatTensor
weight_file = 'path_to_weight_file'
val_loader = torch.utils.data.DataLoader(NyuDepthLoader(data_path, val_lists),batch_size=batch_size, shuffle=True, drop_last=True)
model = Model(batch_size)
model.load_state_dict(load_weights(model, weight_file, dtype))
loss_fn = torch.nn.MSELoss().cuda()
# model.eval()
with torch.no_grad():
for input, depth in val_loader:
input_var = Variable(input.type(dtype))
depth_var = Variable(depth.type(dtype))
output = model(input_var)
input_rgb_image = input_var[0].data.permute(1, 2, 0).cpu().numpy().astype(np.uint8)
input_gt_depth_image = depth_var[0][0].data.cpu().numpy().astype(np.float32)
pred_depth_image = output[0].data.squeeze().cpu().numpy().astype(np.float32)
print (format(type(depth_var)))
pred_depth_image_resize = cv2.resize(pred_depth_image, dsize=(608, 456), interpolation=cv2.INTER_LINEAR)
target_depth_transform = transforms.Compose([flow_transforms.ArrayToTensor()])
pred_depth_image_tensor = target_depth_transform(pred_depth_image_resize)
#both inputs to loss_fn are 'torch.Tensor'
loss_local += loss_fn(pred_depth_image_tensor, depth_var)
num_samples += 1
print ('num_samples {}'.format(num_samples))
err = float(loss_local) / num_samples
print('val_error before train:', err)

What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
Note: testing the model is called inference.
As explained in the official documentation:
Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
So this code must be present once you load the model from a file and do inference.
# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()
This is because dropout works as a regularization for preventing overfitting during training, it is not needed for inference. Same for the batch norms.
When you use eval() this just sets module train label to False and affects only certain types of modules in particular Dropout and BatchNorm.

How to inject data into a graph when using an input pipeline?

I am using an initializable iterator in my code. The iterator returns batches of size 100 from a csv dataset that has 20.000 entries. During training, however, I came across a problem. Consider this piece of code:
def get_dataset_iterator(batch_size):
# parametrized with batch_size
dataset = ...
return dataset.make_initializable_iterator()
## build a model and train it (x is the input of my model)
iterator = get_dataset_iterator(100)
x = iterator.get_next()
y = model(x)
## L1 norm as loss, this works because the model is an autoencoder
loss = tf.abs(x - y)
## training operator
train_op = tf.train.AdamOptimizer(0.01).minimize(loss)
with tf.Session() as sess:
for epoch in range(100):
sess.run(iterator.initializer)
# iterate through the whole dataset once during the epoch and
# do 200 mini batch updates
for _ in range(number_of_samples // batch_size):
sess.run(train_op)
print(f'Epoch {epoch} training done!')
# TODO: print loss after epoch here
I am interested in the training loss AFTER finishing the epoch. It makes most sense to me that I calculate the average loss over the whole training set (e.g. feeding all 20.000 samples through the network and averaging their loss). I could reuse the dataset iterator here with a batch size of 20.000, but I have declared x as the input.
So the questions are:
1.) Does the loss calculation over all 20.000 examples make sense? I have seen some people do the calculation with just a mini-batch (the last batch of the epoch).
2.) How can I calculate the loss over the whole training set with an input pipeline? I have to inject all of training data somehow, so that I can run sess.run(loss) without calculating it over only 100 samples (because x is declared as input).
EDIT FOR CLARIFICATION:
If I wrote my training loop the following way, there would be some things that bother me:
with tf.Session() as sess:
for epoch in range(100):
sess.run(iterator.initializer)
# iterate through the whole dataset once during the epoch and
# do 200 mini batch updates
for _ in range(number_of_samples // batch_size):
_, current_loss = sess.run([train_op, loss])
print(f'Epoch {epoch} training done!')
print(current_loss)
Firstly, loss would still be evaluated before doing the last weight update. That means whatever comes out is not the latest value. Secondly, I would not be able to access current_loss after exiting the for loop so I would not be able to print it.

1) Loss calculation over the whole training set (before updating weights) does make sense and is called batch gradient descent (despite using the whole training set and not a mini batch).
However, calculating a loss for your whole dataset before updating weights is slow (especially with large datasets) and training will take a long time to converge. As a result, using a mini batch of data to calculate loss and update weights is what is normally done instead. Although using a mini batch will produce a noisy estimate of the loss it is actually good enough estimate to train networks with enough training iterations.
EDIT:
I agree that the loss value you print will not be the latest loss with the latest updated weights. Probably for most cases it really doesn't make much different or change results so people just go with how you have wrote the code above. However, if you really want to obtain the true latest loss value after you have done training (to print out) then you will just have to run the loss op again after you have done a train op e.g.:
for _ in range(number_of_samples // batch_size):
sess.run([train_op])
current_loss = sess.run([loss])
This will get your true latest value. Of course this wont be on the whole dataset and will be just for a minibatch of 100. Again the value is likely a good enough estimate but if you wish to calculate exact loss for whole dataset you will have to run through your entire set e.g. another loop and then average the loss:
...
# Train loop
for _ in range(number_of_samples // batch_size):
_, current_loss = sess.run([train_op, loss])
print(f'Epoch {epoch} training done!')
# Calculate loss of whole train set after training an epoch.
sess.run(iterator.initializer)
current_loss_list = []
for _ in range(number_of_samples // batch_size):
_, current_loss = sess.run([loss])
current_loss_list.append(current_loss)
train_loss_whole_dataset = np.mean(current_loss_list)
print(train_loss_whole_dataset)
EDIT 2:
As pointed out doing the serial calls to train_op then loss will call the iterator twice and so things might not work out nicely (e.g. run out of data). Therefore my 2nd bit of code will be better to use.

I think the following code will answer your questions:
(A) how can you print the batch loss AFTER performing the train step? (B) how can you calculate the loss over the entire training set, even though the dataset iterator gives only a batch each time?
import tensorflow as tf
import numpy as np
dataset_size = 200
batch_size= 5
dimension = 4
# create some training dataset
dataset = tf.data.Dataset.\
from_tensor_slices(np.random.normal(2.0,size=(dataset_size,dimension)).
astype(np.float32))
dataset = dataset.batch(batch_size) # take batches
iterator = dataset.make_initializable_iterator()
x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))
loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
# we are going to use control_dependencies so that we know that we have a loss calculation AFTER the train step
with tf.control_dependencies([train_op]):
loss_after_train_op = loss_func(x,w) # this is an identical loss, but will only be calculated AFTER train_op has
# been performed
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# train one epoch
sess.run(iterator.initializer)
for i in range(dataset_size//batch_size):
# the training step will update the weights based on ONE batch of examples each step
loss1,_,loss2 = sess.run([loss,train_op,loss_after_train_op])
print('train step {:d}. batch loss before step: {:f}. batch loss after step: {:f}'.format(i,loss1,loss2))
# evaluate loss on entire training set. Notice that this calculation assumes the the loss is of the form
# tf.reduce_mean(...)
sess.run(iterator.initializer)
epoch_loss = 0
for i in range(dataset_size // batch_size):
batch_loss = sess.run(loss)
epoch_loss += batch_loss*batch_size
epoch_loss = epoch_loss/dataset_size
print('loss over entire training dataset: {:f}'.format(epoch_loss))
As for your question whether it makes sense to calculate loss over the entire training set - yes, it makes sense, for evaluation purposes. It usually does not make sense to perform training steps which are based on all of the training set since this set is usually very large and you want to update your weights more often, without needing to go over the entire training set each time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Evaluate the model during training affects its performance PyTorch - python

Related

BERT Classifier ValueError: Target size (torch.Size([4, 1])) must be the same as input size (torch.Size([4, 2]))

Cannot improve model accuracy

how to evaluate and get accuracy of a Feed forward neural network with pytorch

Training and testing CNN with pytorch. With and without model.eval()

How to inject data into a graph when using an input pipeline?

Categories

Resources