BERT Debugging (not enough values to unpack (expected 2, got 1))

BERT Debugging (not enough values to unpack (expected 2, got 1)) - python

I'm new to BERT and trying to test it on my dataset. The code is as the followings:
# Import BERT model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
bert = AutoModelForMaskedLM.from_pretrained("bert-base-chinese")
class BERT_Arch(nn.Module):
def __init__(self, bert):
super(BERT_Arch, self).__init__()
self.bert = bert
# dropout layer
self.dropout = nn.Dropout(0.1)
# relu activation function
self.relu = nn.ReLU()
# dense layer 1
self.fc1 = nn.Linear(768,512)
# dense layer 2 (Output layer)
self.fc2 = nn.Linear(512,2)
#softmax activation function
self.softmax = nn.LogSoftmax(dim=1)
#define the forward pass
def forward(self, sent_id, mask):
#pass the inputs to the model
_, cls_hs = self.bert(sent_id, attention_mask=mask)
x = self.fc1(cls_hs)
x = self.relu(x)
x = self.dropout(x)
# output layer
x = self.fc2(x)
# apply softmax activation
x = self.softmax(x)
return x
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print('Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
# set initial loss to infinite
best_valid_loss = float('inf')
# empty lists to store training and validation loss of each epoch
train_losses=[]
valid_losses=[]
#for each epoch
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
#train model
train_loss, _ = train()
#evaluate model
valid_loss, _ = evaluate()
#save the best model
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'saved_weights.pt')
# append training and validation loss
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print(f'\nTraining Loss: {train_loss:.3f}')
print(f'Validation Loss: {valid_loss:.3f}')
The error that I get is not enough values to unpack (expected 2, got 1). I have checked the tensor of input_ids and mask, and they looks like the followings:
tensor([[101, 102],
[101, 102],
[101, 102],
...,
[101, 102],
[101, 102],
[101, 102]])
tensor([[1, 1],
[1, 1],
[1, 1],
...,
[1, 1],
[1, 1],
[1, 1]])
tensor([0, 0, 0, ..., 0, 0, 0])
I think that the dimension of tensors is not wrong, so don't need to unsqueeze them as other answers show. Can someone check this for me? thanks ahead!
The complete errors prompt:
10 print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
11 #train model
---> 12 train_loss, _ = train()
13 #evaluate model
14 valid_loss, _ = evaluate()
16 model.zero_grad()
17 # get model predictions for the current batch
---> 18 preds = model(sent_id, mask)
19 # compute the loss between actual and predicted values
20 loss = cross_entropy(preds, labels)
19 def forward(self, sent_id, mask):
20 #pass the inputs to the model
---> 21 _, cls_hs = self.bert(sent_id, attention_mask=mask)
22 x = self.fc1(cls_hs)
23 x = self.relu(x)
ValueError: not enough values to unpack (expected 2, got 1)

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

I am trying to use a pre-trained (resnet) model on the MNIST dataset, but this error always appears to me
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead.
This is my code:
MNIST dataset
from torchvision import datasets
import torchvision.transforms as transforms
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 100
# convert data to torch.FloatTensor
transform = transforms.Compose
([
transforms.ToTensor(),
transforms.RandomErasing(p=0.2)
])
# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False, download=True, transform=transform)
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
MLP network definition
from torch.nn.modules.activation import ReLU
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# building the model and the type to model → Sequential
self.model = nn.Sequential
(
# building the layers and the type to layers → Linear
nn.Linear(28 * 28 , 200), # input layer = 100
# to avoid problem to overfitting → using the Dropout (As we can see, dropouts are used to randomly remove neurons while training of the neural network.)
nn.Dropout(0.2), # to use Dropout to avoid problem → overfitting
nn.ReLU(True), # activation function
nn.BatchNorm1d(num_features = 200), # Batch normalization (also known as batch norm) is a method used
# to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling.
nn.Linear(200 , 10), # output layer = 10
)
def forward(self, x):
x = x.view(-1, 1, 28*28)
return self.model(x)
# initialize the N
model = Net()
print(model)
model = resnet50(weights = ResNet50_Weights.IMAGENET1K_V2)
model.fc = nn.Linear(512,10)
define an optimizer to update the model parameters
## Specify loss and optimization functions
# specify loss function
criterion = nn.CrossEntropyLoss()
# specify optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
Training Data
#number of epochs to train the model
n_epochs = 1 # suggest training between 20-50 epochs
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
###################
# train the model #
###################
for data, target in train_loader:
# clear the gradients of all optimized variables
data = data.repeat(1,3,1,1)
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
# print training statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch+1,
train_loss
))
Initialize lists to monitor test loss and accuracy
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model.eval() # prep model for *evaluation*
for data, target in test_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update test loss
test_loss += loss.item()*data.size(0)
# convert output probabilities to predicted class
_, pred = torch.max(output, 1)
# compare predictions to true label
correct = np.squeeze(pred.eq(target.data.view_as(pred)))
# calculate test accuracy for each object class
for i in range(16):
label = target.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
str(i), 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))

Very high validation loss/small train loss in Pytorch, while finetuning resnet 50

I am training model to classify 2 types of images. I have decided to take a transfer-learning approach, freeze every part of resnet50 and new layer and start finetuning process. My dataset is not perfectly balanced but i used weights for that purpose.Please take a look at validation loss vs training loss graph. It seems to be extremely inconsitent. Could you please take a look at my code? I am new to Pytorch, maybe there is something wrong with my method and code. Final accuracy tested on test set is 86%. Thank you!
learning_rate = 1e-1
num_epochs = 100
patience = 10
batch_size = 100
weights = [4, 1]
model = models.resnet50(pretrained=True)
# Replace last layer
num_features = model.fc.in_features
model.fc = nn.Sequential(
nn.Linear(num_features, 512),
nn.ReLU(inplace=True),
nn.Linear(512, 64),
nn.Dropout(0.5, inplace=True),
nn.Linear(64, 2))
class_weights = torch.FloatTensor(weights).cuda()
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
running_loss = 0
losses = []
# To freeze the residual layers
for param in model.parameters():
param.requires_grad = False
for param in model.fc.parameters():
param.requires_grad = True
# Find total parameters and trainable parameters
total_params = sum(p.numel() for p in model.parameters())
print(f'{total_params:,} total parameters.')
total_trainable_params = sum(
p.numel() for p in model.parameters() if p.requires_grad)
print(f'{total_trainable_params:,} training parameters.')
24,590,082 total parameters.
1,082,050 training parameters.
# initialize the early_stopping object
early_stopping = pytorchtools.EarlyStopping(patience=patience, verbose=True)
for epoch in range(num_epochs):
##########################
#######TRAIN MODEL########
##########################
epochs_loss=0
##Switch to train mode
model.train()
for i, (images, labels) in enumerate(train_dl):
# Move tensors to the configured device
images = images.to(device)
labels = labels.to(device)
# Forward pass
# Backprpagation and optimization
optimizer.zero_grad()
outputs = model(images).to(device)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#calculate train_loss
train_losses.append(loss.item())
##########################
#####VALIDATE MODEL#######
##########################
model.eval()
for images, labels in val_dl:
images = images.to(device)
labels = labels.to(device)
outputs = model(images).to(device)
loss = criterion(outputs,labels)
valid_losses.append(loss.item())
# print training/validation statistics
# calculate average loss over an epoch
train_loss = np.average(train_losses)
valid_loss = np.average(valid_losses)
# print(train_loss)
avg_train_losses.append(train_loss)
avg_valid_losses.append(valid_loss)
print_msg = (f'train_loss: {train_loss:.5f} ' + f'valid_loss: {valid_loss:.5f}')
print(print_msg)
# clear lists to track next epoch
train_losses = []
valid_losses = []
early_stopping(valid_loss, model)
print(epoch)
if early_stopping.early_stop:
print("Early stopping")
break

Dimension out of range (expected to be in range of [-4, 3], but got 64)

I am new to Pytorch and I've been working on training the MLP model using the MNIST dataset. Basically, I am feeding the model with images and labels as an input and training the dataset on it. I am using CrossEntropyLoss() as a loss function, however I am getting the dimension error whenever I run my model.
IndexError Traceback (most recent call last)
<ipython-input-37-04f8cfc1d3b6> in <module>()
47
48 # Forward
---> 49 outputs = model(images)
50
5 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/flatten.py in forward(self, input)
38
39 def forward(self, input: Tensor) -> Tensor:
---> 40 return input.flatten(self.start_dim, self.end_dim)
41
42 def extra_repr(self) -> str:
IndexError: Dimension out of range (expected to be in range of [-4, 3], but got 64)
Here is the MLP class that I've created
class MLP(nn.Module):
def __init__(self, device, input_size = 1*28*28, output_size = 10):
super().__init__()
self.seq = nn.Sequential(nn.Flatten(BATCH=64, input_size),
nn.Linear(input_size, 32),
nn.ReLU(),
nn.Linear(32, output_size))
self.to(device)
def forward(self, x):
return self.seq(x)
And rest of the training model is
from tqdm.notebook import tqdm
from datetime import datetime
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim
exp_name = "MLP version 1"
# log_name = "logs/" + exp_name + f" {datetime.now()}"
# print("Tensorboard logs will be written to:", log_name)
# writer = SummaryWriter(log_name)
criterion = nn.CrossEntropyLoss()
model = MLP(device)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.0001)
num_epochs = 10
for epoch in tqdm(range(num_epochs)):
epoch_train_loss = 0.0
epoch_accuracy = 0.0
for data in train_loader:
images, labels = data
images, labels = images.to(device), labels.to(device)
images = images.permute(0, 3, 1, 2)
optimizer.zero_grad()
print("hello")
outputs = model(images)
loss = criterion(outputs, labels)
epoch_train_loss += loss.item()
loss.backward()
optimizer.step()
accuracy = compute_accuracy(outputs, labels)
epoch_accuracy += accuracy
writer.add_scalar("Loss/training", epoch_train_loss, epoch)
writer.add_scalar("Accuracy/training", epoch_accuracy / len(train_loader), epoch)
print('epoch: %d loss: %.3f' % (epoch + 1, epoch_train_loss / len(train_loader)))
print('epoch: %d accuracy: %.3f' % (epoch + 1, epoch_accuracy / len(train_loader)))
epoch_accuracy = 0.0
# The code below computes the validation results
for data in val_loader:
images, labels = data
images, labels = images.to(device), labels.to(device)
images = images.permute(0, 3, 1, 2)
model.eval()
with torch.no_grad():
outputs = model(images)
accuracy = compute_accuracy(outputs, labels)
epoch_accuracy += accuracy
writer.add_scalar("Accuracy/validation", epoch_accuracy / len(val_loader), epoch)
print("finished training")
Any help would be appreciated. Thank you.

nn.Flatten() instead of nn.Flatten(BATCH=64, input_size)
https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html

Pytorch: Weights not changing during training

Basically the same question as this one here, which was never answered: Why the first convolutional layer weights don't change during training?
I just want to watch the weights of my convolutional layers as they change during training. How can I do this? No matter what I do, the weights seem to stay the same even though loss is decreasing.
I'm trying to follow this tutorial here although the model is slightly different: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
Model
class CNN(nn.Module):
def __init__(self):
super(Digit_Classifier, self).__init__()
self.conv1 = nn.Conv2d(1,6,3)
self.pool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(6,16,3)
self.pool2 = nn.MaxPool2d(2)
self.out = nn.Linear(400, 10)
def forward(self, inputs):
x = self.pool1(F.relu(self.conv1(inputs)))
x = self.pool2(F.relu(self.conv2(x)))
x = torch.flatten(x, start_dim=1)
x = self.out(x)
return x
Training
def train(epochs=100):
criterion = nn.CrossEntropyLoss()
net = CNN()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
losses = []
for epoch in range(epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
w = model.conv1._parameters['weight']
print(w)
losses.append(running_loss / z)
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
return net

If you don't use any normalization modules, the closer the weights are to the input of the network, the smaller the gradients and therefore the changes will be, so the changes are probably in the decimals that aren't displayed anymore in your print() statement. To see the changes, I'd suggest saving the weights from one iteration to the next, and subtracting them to display the difference:
...
w = model.conv1._parameters['weight'].detach()
print(w-w_previous)
w_previous = w
...

Feeding timeseries data into Tensorflow for LSTM classifier training

I have a dataframe of shape (38307, 26) with timestamp as index:
I'm trying to implement a LSTM classifier but I'm struggling to feed it into the DataFlow
The final arrays I'm trying to feed are of shape '(X_train = (38307, 25), y_train = (38307, 2))'
I have added the code in case
# Parametres
learning_rate = 0.001
training_epochs = 100
batch_size = 128
display_step = 10
# Network Parameters
n_input = 25 # features= 25
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 2 # Binary classification
# TF Graph input
x = tf.placeholder("float32", [None, n_steps, n_input])
y = tf.placeholder("float32", [None, n_classes])
# TF Weights
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = RNN(x, weights, biases)
# Initialize the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(len(X_train)/batch_size)
X_batches = np.array_split(X_train, total_batch)
Y_batches = np.array_split(y_train, total_batch)
#Loop over all batches
for i in range(total_batch):
batch_x, batch_y = X_batches[i], Y_batches[i]
# batch_y.shape = (batch_y.shape[0]), 1)
# Run optimization op (backprop) and cost op(to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
# Compute average loss
avg_cost += c / total_batch
#Display logs per epoch step
if epoch % display_step == 0:
print(("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)))
print('Optimization finished')
# Store session for analysis with TensorBoard
writer = tf.summary.FileWriter("/tmp/test", sess.graph)
#Test model
print("Accuracy:", accuracy.eval({x: X_test, y: y_test}))
global result
result = tf.argmax(pred, 1).eval({x: X_test, y: y_test})
EDIT the RNN function:
def RNN(x, weights, biases):
# Prepare data shape to match 'rnn' function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required Shape: 'n_steps' tensors list of shape (batch size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_steps, x)
# x = tf.split(x, n_steps, 0) # Syntax change this version
# LSTM tensorflow using rnn from tensorflow.contrib
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Get LSTM cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']

Unfortunately, the most important part of your code, is hidden in the RNN function.
Some tips to help you out: I guess you are trying to build a dynamic RNN... (is that correct? ) In that case, a common mistake I see is that people confuse the time major and batch major setting of these RNNs. In other words, is you input data [batch,time,variables], or [time,batch,variables].
More about this can be found here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard8/tf.nn.dynamic_rnn.md

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

BERT Debugging (not enough values to unpack (expected 2, got 1)) - python

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

Very high validation loss/small train loss in Pytorch, while finetuning resnet 50

Dimension out of range (expected to be in range of [-4, 3], but got 64)

Pytorch: Weights not changing during training

Feeding timeseries data into Tensorflow for LSTM classifier training

Categories

Resources