Runtime error when reading data from a training dataset PyTorch - python

I have a sample of data in my training dataset which I am able to view if I print the data, but when accessing it to train the data, I keep getting RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight' in call to _thnn_conv2d_forward. I am unable to figure out why this is happening. I have also attached an image at the end to give a better understanding of the error message.
The labels.txt files looks as follows (name of image linking to another folder with corresponding images, center points (x, y) and radius)
0000, 0.67 , 0.69 , 0.26
0001, 0.69 , 0.33 , 0.3
0002, 0.16 , 0.27 , 0.15
0003, 0.54 , 0.33 , 0.17
0004, 0.32 , 0.45 , 0.3
0005, 0.78 , 0.26 , 0.17
0006, 0.44 , 0.49 , 0.19
EDIT: This is the loss function and optimizer I am using optimizer =
optim.Adam(model.parameters(), lr=0.001)
nn.CrossEntropyLoss()
My validate model function is as follows:
def validate_model(model, loader):
model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
# (dropout is set to zero)
val_running_loss = 0.0
val_running_correct = 0
for int, data in enumerate(loader):
data, target = data['image'].to(device), data['labels'].to(device)
output = model(data)
loss = my_loss(output, target)
val_running_loss = val_running_loss + loss.item()
_, preds = torch.max(output.data, 1)
val_running_correct = val_running_correct+ (preds == target).sum().item()
avg_loss = val_running_loss/len(loader.dataset)
val_accuracy = 100. * val_running_correct/len(loader.dataset)
#----------------------------------------------
# implementation needed here
#----------------------------------------------
return avg_loss, val_accuracy
I have a fit function for working out training loss:
def fit(model, train_dataloader):
model.train()
train_running_loss = 0.0
train_running_correct = 0
for i, data in enumerate(train_dataloader):
print(data)
#I believe this is causing the error, but not sure why.
data, target = data['image'].to(device), data['labels'].to(device)
optimizer.zero_grad()
output = model(data)
loss = my_loss(output, target)
train_running_loss = train_running_loss + loss.item()
_, preds = torch.max(output.data, 1)
train_running_correct = train_running_correct + (preds == target).sum().item()
loss.backward()
optimizer.step()
train_loss = train_running_loss/len(train_dataloader.dataset)
train_accuracy = 100. * train_running_correct/len(train_dataloader.dataset)
print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.2f}')
return train_loss, train_accuracy
and the following train_model function which stores the losses and accuracy in a list:
train_losses , train_accuracy = [], []
validation_losses , val_accuracy = [], []
def train_model(model,
optimizer,
train_loader,
validation_loader,
train_losses,
validation_losses,
epochs=1):
"""
Trains a neural network.
Args:
model - model to be trained
optimizer - optimizer used for training
train_loader - loader from which data for training comes
validation_loader - loader from which data for validation comes (maybe at the end, you use test_loader)
train_losses - adding train loss value to this list for future analysis
validation_losses - adding validation loss value to this list for future analysis
epochs - number of runs over the entire data set
"""
#----------------------------------------------
# implementation needed here
#----------------------------------------------
for epoch in range(epochs):
train_epoch_loss, train_epoch_accuracy = fit(model, train_loader)
val_epoch_loss, val_epoch_accuracy = validate_model(model, validation_loader)
train_losses.append(train_epoch_loss)
train_accuracy.append(train_epoch_accuracy)
validation_losses.append(val_epoch_loss)
val_accuracy.append(val_epoch_accuracy)
return
And when I run the following code I get the Runtime Error:
train_model(model,
optimizer,
train_loader,
validation_loader,
train_losses,
validation_losses,
epochs=2)
ERROR: RuntimeError: Expected object of scalar type Double but got
scalar type Float for argument #2 'weight' in call to
_thnn_conv2d_forward
Here is a screenshot of the error message as well:
ERROR
EDIT: this is how my model looks like, I am supposed to detect circles in an image with the given centers and radius in the labels.txt file and paint over them - the painting function was given, I had create the model and the training and validation.
class CircleNet(nn.Module): # nn.Module is parent class
def __init__(self):
super(CircleNet, self).__init__() #calls init of parent class
#----------------------------------------------
# implementation needed here
#----------------------------------------------
#keep dimensions of input image: (I-F+2P)/S +1= (128-3+2)/1 + 1 = 128
#RGB image = input channels = 3. Use 12 filters for first 2 convolution layers, then double
self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
self.conv4 = nn.Conv2d(in_channels=24, out_channels=32, kernel_size=3, stride=1, padding=1)
#Pooling to reduce sizes, and dropout to prevent overfitting
self.pool = nn.MaxPool2d(kernel_size=2)
self.relu = nn.ReLU()
self.drop = nn.Dropout2d(p=0.25)
self.norm1 = nn.BatchNorm2d(12)
self.norm2 = nn.BatchNorm2d(24)
# There are 2 pooling layers, each with kernel size of 2. Output size: 128/(2*2) = 32
# Have 3 output features, corresponding to x-pos, y-pos, radius.
self.fc = nn.Linear(in_features=32 * 32 * 32, out_features=3)
def forward(self, x):
"""
Feed forward through network
Args:
x - input to the network
Returns "x", which is the network's output
"""
#----------------------------------------------
# implementation needed here
#----------------------------------------------
#Conv1
out = self.conv1(x)
out = self.pool(out)
out = self.relu(out)
out = self.norm1(out)
#Conv2
out = self.conv2(out)
out = self.pool(out)
out = self.relu(out)
out = self.norm1(out)
#Conv3
out = self.conv3(out)
out = self.drop(out)
#Conv4
out = self.conv4(out)
out = F.dropout(out, training=self.training)
out = out.view(-1, 32 * 32 * 32)
out = self.fc(out)
return out
EDIT: Should it help here is my custom loss function:
criterion = nn.CrossEntropyLoss()
def my_loss(outputs, labels):
"""
Args:
outputs - output of network ([batch size, 3])
labels - desired labels ([batch size, 3])
"""
loss = torch.zeros(1, dtype=torch.float, requires_grad=True)
loss = loss.to(device)
loss = criterion(outputs, labels)
#----------------------------------------------
# implementation needed here
#----------------------------------------------
# Observe: If you need to iterate and add certain values to loss defined above
# you cannot write: loss +=... because this will raise the error:
# "Leaf variable was used in an inplace operation"
# Instead, to avoid this error write: loss = loss + ...
return loss
Train Loader (given to me):
train_dir = "./train/"
validation_dir = "./validation/"
test_dir = "./test/"
train_dataset = ShapesDataset(train_dir)
train_loader = DataLoader(train_dataset,
batch_size=32,
shuffle=True)
validation_dataset = ShapesDataset(validation_dir)
validation_loader = DataLoader(validation_dataset,
batch_size=1,
shuffle=False)
test_dataset = ShapesDataset(test_dir)
test_loader = DataLoader(test_dataset,
batch_size=1,
shuffle=False)
print("train loader examples :", len(train_dataset))
print("validation loader examples:", len(validation_dataset))
print("test loader examples :", len(test_dataset))
EDIT: This view image, target circle labels and network output was also given:
"""
View first image of a given number of batches assuming that model has been created.
Currently, lines assuming model has been creatd, are commented out. Without a model,
you can view target labels and the corresponding images.
This is given to you so that you may see how loaders and model can be used.
"""
loader = train_loader # choose from which loader to show images
bacthes_to_show = 2
with torch.no_grad():
for i, data in enumerate(loader, 0): #0 means that counting starts at zero
inputs = (data['image']).to(device) # has shape (batch_size, 3, 128, 128)
labels = (data['labels']).to(device) # has shape (batch_size, 3)
img_fnames = data['fname'] # list of length batch_size
#outputs = model(inputs.float())
img = Image.open(img_fnames[0])
print ("showing image: ", img_fnames[0])
labels_str = [ float(("{0:.2f}".format(x))) for x in labels[0]]#labels_np_arr]
#outputs_np_arr = outputs[0] # using ".numpy()" to convert tensor to numpy array
#outputs_str = [ float(("{0:.2f}".format(x))) for x in outputs_np_arr]
print("Target labels :", labels_str )
#print("network coeffs:", outputs_str)
print()
#img.show()
if (i+1) == bacthes_to_show:
break
Here is the output I'm getting, it is supposed to cover the full circle:
Output I am getting
Any idea will be helpful.

I basically added(in the validate_model and fit function) this:
_, target= torch.max(target.data, 1)
below the _, preds = torch.max(output.data, 1) line of code to get the data and target to be of the same length. Also changed the loss function from CrossEntropyLoss to MSELoss
Then in the same functions:
I changed the following line output = model(data) to output = model(data.float())

Related

Word2Vec with negative sampling python implementation

I'm trying to implement word2vec with negative sampling in python almost from scratch and quite new in neural networks and faced some issues. Would be very appreciate for any help.
So, I wrote simple nn with a forward pass. I didn't get which element have to have grad_fn, I'd been getting error like 'tensor have no grad_fn' until I add requires_grad_() on the returning value. Is that correct?
dataset = Word2VecNegativeSampling(data, num_negative_samples, 30000)
dataset.generate_dataset()
wordvec_dim = 10
class Word2VecNegativeSamples(nn.Module):
def __init__(self, num_tokens):
super(Word2VecNegativeSamples, self).__init__()
self.input = nn.Linear(num_tokens, 10, bias=False)
self.output = nn.Linear(10, num_tokens, bias=False)
self.num_tokens = num_tokens
def forward(self, input_index_batch, output_indices_batch):
'''
Implements forward pass with negative sampling
Arguments:
input_index_batch - Tensor of ints, shape: (batch_size, ), indices of input words in the batch
output_indices_batch - Tensor if ints, shape: (batch_size, num_negative_samples+1),
indices of the target words for every sample
Returns:
predictions - Tensor of floats, shape: (batch_size, num_negative_samples+1)
'''
results = []
batch_size = len(input_index_batch)
for i in range(batch_size):
input_one_hot = torch.zeros(self.num_tokens)
input_one_hot[input_index_batch[i]] = 1
forward_result = self.output(self.input(input_one_hot))
results.append(torch.tensor([forward_result[out_index] for out_index in output_indices_batch[i]]))
return torch.stack(results).requires_grad_()
nn_model = Word2VecNegativeSamples(data.num_tokens())
nn_model.type(torch.FloatTensor)
After all i'm trying to train the model, but neither loss nor accuracy changing. Is the code for model prediction correct as well?
Here is training code:
def train_neg_sample(model, dataset, train_loader, optimizer, scheduler, num_epochs):
loss = nn.BCEWithLogitsLoss().type(torch.FloatTensor)
loss_history = []
train_history = []
for epoch in range(num_epochs):
model.train() # Enter train mode
dataset.generate_dataset()
loss_accum = 0
correct_samples = 0
total_samples = 0
for i_step, (inp, out, lab) in enumerate(train_loader):
prediction = model(inp, out)
loss_value = loss(prediction, lab)
optimizer.zero_grad()
loss_value.backward()
optimizer.step()
_, indices = torch.max(prediction, 1)
correct_samples += torch.sum(indices == 0)
total_samples += lab.shape[0]
loss_accum += loss_value
scheduler.step()
ave_loss = loss_accum / i_step
train_accuracy = float(correct_samples) / total_samples
loss_history.append(float(ave_loss))
train_history.append(train_accuracy)
print("Epoch#: %i, Average loss: %f, Train accuracy: %f" % (epoch, ave_loss, train_accuracy))
return loss_history, train_history
If your loss function is not changing, it's highly probable that you register the wrong set of parameters to the optimizer. Can you post the code snippet where you initialize your model and optimizer? It is supposed to look like this:
nn_model = Word2VecNegativeSamples(data.num_tokens())
optimizer = optim.SGD(nn_model.parameters(), lr=0.001, momentum=0.9)

Pytorch error when launching two distinct backward

I am building a simple autoencoder followed by an MLP neural nets. Regarging the autoencoder I am not running into any problem
# ---- Prepare training set ----
x_data = train_set_categorical.drop(["churn"], axis=1).to_numpy()
labels = train_set_categorical.loc[:, "churn"].to_numpy()
dataset = TensorDataset(torch.Tensor(x_data), torch.Tensor(labels) )
loader = DataLoader(dataset, batch_size=127)
# ---- Model Initialization ----
model = AE()
# Validation using MSE Loss function
loss_function = nn.MSELoss()
# Using an Adam Optimizer with lr = 0.1
optimizer = torch.optim.Adam(model.parameters(),
lr = 1e-1,
weight_decay = 1e-8)
epochs = 50
outputs = []
losses = []
for epoch in range(epochs):
for (image, _) in loader:
# Output of Autoencoder
embbeding, reconstructed = model(image)
# Calculating the loss function
loss = loss_function(reconstructed, image)
# The gradients are set to zero,
# the the gradient is computed and stored.
# .step() performs parameter update
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Storing the losses in a list for plotting
losses.append(loss)
if epoch == 49:
outputs.append(embbeding)
But then I am feeding an MLP with the outcome of the autoencoder and this is where things starts to fail
class Feedforward(torch.nn.Module):
def __init__(self):
super().__init__()
self.neural = torch.nn.Sequential(
torch.nn.Linear(33, 260),
torch.nn.ReLU(),
torch.nn.Linear(260, 450),
torch.nn.ReLU(),
torch.nn.Linear(450, 260),
torch.nn.ReLU(),
torch.nn.Linear(260, 1),
torch.nn.Sigmoid()
)
def forward(self, x):
outcome = self.neural(x.float())
return outcome
modelz = Feedforward()
criterion = torch.nn.BCELoss()
opt = torch.optim.Adam(modelz.parameters(), lr = 0.01)
modelz.train()
epoch = 20
for epoch in range(epoch):
opt.zero_grad()
# Forward pass
y_pred = modelz(x_train)
# Compute Loss
loss_2 = criterion(y_pred.squeeze(), torch.tensor(y_train).to(torch.float32))
#print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
# Backward pass
loss_2.backward()
opt.step()
I get the following error:
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.
Of course I have tried to add "retain_graph=True" to both backwards or only the first one but it does not seem to solve the problem. If I launch both code independently from another It works but as a sequence I don't know why but it is not.
You should be able to disconnect the output of the auto-encoder from the model by calling embbeding.detach(), before appending it to outputs.

RuntimeError: shape '[4, 512]' is invalid for input of size 1024 while while evaluating test data

I am trying XLnet over Jigsaw toxic dataset.
When I train my data with
input_ids = d["input_ids"].reshape(4,512).to(device) # batch size x seq length
it trains perfectly.
But when I try to test the model with test data with reshaping the input_ids in the same way, it generates a run time error:
shape '[4, 512]' is invalid for input of size 1024
This is the method I am using for training:
def train_epoch(model, data_loader, optimizer, device, scheduler, n_examples):
model = model.train()
losses = []
acc = 0
counter = 0
for d in data_loader:
input_ids = d["input_ids"].reshape(4,512).to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
loss = outputs[0]
logits = outputs[1]
# preds = preds.cpu().detach().numpy()
_, prediction = torch.max(outputs[1], dim=1)
targets = targets.cpu().detach().numpy()
prediction = prediction.cpu().detach().numpy()
accuracy = metrics.accuracy_score(targets, prediction)
acc += accuracy
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
counter = counter + 1
return acc / counter, np.mean(losses)
This is the method I am using for evaluating my test data:
def eval_model(model, data_loader, device, n_examples):
model = model.eval()
losses = []
acc = 0
counter = 0
with torch.no_grad():
for d in data_loader:
# print(d["input_ids"])
input_ids = d["input_ids"].reshape(4,512).to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
loss = outputs[0]
logits = outputs[1]
_, prediction = torch.max(outputs[1], dim=1)
targets = targets.cpu().detach().numpy()
prediction = prediction.cpu().detach().numpy()
accuracy = metrics.accuracy_score(targets, prediction)
acc += accuracy
losses.append(loss.item())
counter += 1
return acc / counter, np.mean(losses)
And when I try to run the eval_model method with my test data, it generates a run time error.
My model info:
I am unable to understand what wrong I am doing. Can anyone please help me out with this? Thank you.
I think the problem is that the training dataset's d['input_ids'] was of size 4*512 = 2048 so it could be divided into 4 and 512.
But the testing dataset's d['input_ids'] is of size 1024, which cannot be divided into 4 and 512.
Since you haven't given the model description, i can't say if you should change it to (-1, 512) or (4, -1) [using -1 in reshape tells numpy to figure that dimension out automatically.
e.g. reshaping an array of 2048 elements into (4, 512) can be done by reshape(4,512) and reshape(-1, 512) and reshape(4, -1) as well.

How to add BiLSTM on top of BERT from Huggingface + CUDA out of memory. Tried to allocate 16.00 MiB

I have the below code for a binary classification and it works fine but i would like to modify the nn.Sequential parameters and add an BiLSTM layer. I have the below code:
class BertClassifier(nn.Module):
def __init__(self, freeze_bert=False):
super(BertClassifier, self).__init__()
# Specify hidden size of BERT, hidden size of our classifier, and number of labels
D_in, H, D_out = 768, 50, 2
# Instantiate BERT model
self.bert = BertModel.from_pretrained('bert-base-multilingual-uncased')
# Instantiate an one-layer feed-forward classifier
self.classifier = nn.Sequential(nn.Linear(D_in, H),nn.ReLU(),nn.Linear(H, D_out))
# Freeze the BERT model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = False
def forward(self, input_ids, attention_mask):
# Feed input to BERT
outputs = self.bert(input_ids=input_ids,attention_mask=attention_mask)
# Extract the last hidden state of the token `[CLS]` for classification task
last_hidden_state_cls = outputs[0][:, 0, :]
# Feed input to classifier to compute logits
logits = self.classifier(last_hidden_state_cls)
return logits
I have tried to modify the sequential like this self.classifier = nn.Sequential(nn.LSTM(D_in, H, batch_first=True, bidirectional=True),nn.ReLU(),nn.Linear(H, D_out)) but then it throws the error RuntimeError: input must have 3 dimensions, got 2 on line logits = self.classifier(last_hidden_state_cls). I found that I can use nn.ModuleDict instead of nn.Sequential and i made the below :
self.classifier = nn.ModuleDict({
'lstm': nn.LSTM(input_size=D_in, hidden_size=H,batch_first=True, bidirectional=True ),
'linear': nn.Linear(in_features=H,out_features=D_out)})
But now I'm having issues computing the forward function with this. Can someone advice how i can properly modify the forward function?
Update: I also installed CUDA and now when I run the code it returns the error CUDA out of memory. Tried to allocate 16.00 MiB and I tried to lower the batch size but that doesn't fix the problem. I also tried the below but didn't resolved either. Any advice, please?
import torch, gc
gc.collect()
torch.cuda.empty_cache()
Update with the code:
MAX_LEN = 64
# For fine-tuning BERT, the authors recommend a batch size of 16 or 32.
batch_size = 32
VALID_BATCH_SIZE = 4
file1 = open('MH.txt', 'r')
list_com = []
list_label = []
for line in file1:
possible_labels = 'positive|negative'
label = re.findall(possible_labels, line)
line = re.sub(possible_labels, ' ', line)
line = re.sub('\n', ' ', line)
list_com.append(line)
list_label.append(label[0])
list_tuples = list(zip(list_com, list_label))
file1.close()
labels = ['positive', 'negative']
df = pd.DataFrame(list_tuples, columns=['text', 'label'])
df['label'] = df['label'].map({'positive': 1, 'negative': 0})
for i in range(0,len(df['label'])):
list_label[i] = df['label'][i]
#print(df)
#print(df['label'].value_counts())
X = df.text.values
y = df.label.values
X_train, X_val, y_train, y_val =\
train_test_split(X, y, test_size=0.1, random_state=2020)
def text_preprocessing(text):
# Remove '#name'
text = re.sub(r'(#.*?)[\s]', ' ', text)
# Replace '&' with '&'
text = re.sub(r'&', '&', text)
# Remove trailing whitespace
text = re.sub(r'\s+', ' ', text).strip()
return text
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-uncased', do_lower_case=True)
# Create a function to tokenize a set of texts
def preprocessing_for_bert(data):
input_ids = []
attention_masks = []
for sent in data:
encoded_sent = tokenizer.encode_plus(
text=text_preprocessing(sent), # Preprocess sentence
add_special_tokens=True, # Add `[CLS]` and `[SEP]`
max_length=MAX_LEN, # Max length to truncate/pad
pad_to_max_length=True, # Pad sentence to max length
# return_tensors='pt', # Return PyTorch tensor
return_attention_mask=True # Return attention mask
)
# Add the outputs to the lists
input_ids.append(encoded_sent.get('input_ids'))
attention_masks.append(encoded_sent.get('attention_mask'))
# Convert lists to tensors
input_ids = torch.tensor(input_ids)
attention_masks = torch.tensor(attention_masks)
return input_ids, attention_masks
train_inputs, train_masks = preprocessing_for_bert(X_train)
val_inputs, val_masks = preprocessing_for_bert(X_val)
# Convert other data types to torch.Tensor
train_labels = torch.tensor(y_train)
val_labels = torch.tensor(y_val)
# Create the DataLoader for our training set
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
# Create the DataLoader for our validation set
val_data = TensorDataset(val_inputs, val_masks, val_labels)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler=val_sampler, batch_size=batch_size)
# Create the BertClassfier class
class BertClassifier(nn.Module):
"""Bert Model for Classification Tasks."""
def __init__(self, freeze_bert=False):
"""
#param bert: a BertModel object
#param classifier: a torch.nn.Module classifier
#param freeze_bert (bool): Set `False` to fine-tune the BERT model
"""
super(BertClassifier, self).__init__()
# Specify hidden size of BERT, hidden size of our classifier, and number of labels
D_in, H, D_out = 768, 50, 2
# Instantiate BERT model
self.bert = BertModel.from_pretrained('bert-base-multilingual-uncased')
# Instantiate an one-layer feed-forward classifier
self.classifier = nn.ModuleDict({
'lstm': nn.LSTM(input_size=D_in, hidden_size=H, batch_first=True, bidirectional=True),
'linear': nn.Linear(in_features=H, out_features=D_out)})
# Freeze the BERT model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = False
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids=input_ids,attention_mask=attention_mask)
sequence_output = outputs[0]
sequence_output, _ = self.lstm(sequence_output)
linear_output = self.linear(sequence_output[:, -1])
return linear_output
def initialize_model(epochs=4):
# Instantiate Bert Classifier
bert_classifier = BertClassifier(freeze_bert=False)
print(bert_classifier)
# Tell PyTorch to run the model on GPU
bert_classifier.to(device)
# Create the optimizer
optimizer = AdamW(bert_classifier.parameters(), lr=5e-5)
# Total number of training steps
total_steps = len(train_dataloader) * epochs
# Set up the learning rate scheduler
scheduler = get_linear_schedule_with_warmup(optimizer,num_warmup_steps=0,num_training_steps=total_steps)
return bert_classifier, optimizer, scheduler
# Specify loss function
loss_fn = nn.CrossEntropyLoss()
def set_seed(seed_value=42):
"""Set seed for reproducibility."""
random.seed(seed_value)
np.random.seed(seed_value)
torch.manual_seed(seed_value)
torch.cuda.manual_seed_all(seed_value)
def train(model, train_dataloader, val_dataloader=None, epochs=4, evaluation=False):
"""Train the BertClassifier model."""
# Start training loop
print("Start training...\n")
for epoch_i in range(epochs):
# Print the header of the result table
print(f"{'Epoch':^7} | {'Batch':^7} | {'Train Loss':^12} | {'Val Loss':^10} | {'Val Acc':^9} | {'Elapsed':^9}")
print("-" * 70)
# Measure the elapsed time of each epoch
t0_epoch, t0_batch = time.time(), time.time()
# Reset tracking variables at the beginning of each epoch
total_loss, batch_loss, batch_counts = 0, 0, 0
# Put the model into the training mode
model.train()
# For each batch of training data...
for step, batch in enumerate(train_dataloader):
batch_counts += 1
# Load batch to GPU
b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)
# Zero out any previously calculated gradients
model.zero_grad()
# Perform a forward pass. This will return logits.
logits = model(b_input_ids, b_attn_mask)
# Compute loss and accumulate the loss values
loss = loss_fn(logits, b_labels)
batch_loss += loss.item()
total_loss += loss.item()
# Perform a backward pass to calculate gradients
loss.backward()
# Clip the norm of the gradients to 1.0 to prevent "exploding gradients"
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Update parameters and the learning rate
optimizer.step()
scheduler.step()
# Print the loss values and time elapsed for every 20 batches
if (step % 20 == 0 and step != 0) or (step == len(train_dataloader) - 1):
# Calculate time elapsed for 20 batches
time_elapsed = time.time() - t0_batch
# Print training results
print(
f"{epoch_i + 1:^7} | {step:^7} | {batch_loss / batch_counts:^12.6f} | {'-':^10} | {'-':^9} | {time_elapsed:^9.2f}")
# Reset batch tracking variables
batch_loss, batch_counts = 0, 0
t0_batch = time.time()
# Calculate the average loss over the entire training data
avg_train_loss = total_loss / len(train_dataloader)
print("-" * 70)
#Evaluation
if evaluation == True:
# After the completion of each training epoch, measure the model's performance
# on our validation set.
val_loss, val_accuracy = evaluate(model, val_dataloader)
# Print performance over the entire training data
time_elapsed = time.time() - t0_epoch
print(
f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^9.2f} | {time_elapsed:^9.2f}")
print("-" * 70)
print("\n")
print("Training complete!")
def evaluate(model, val_dataloader):
"""After the completion of each training epoch, measure the model's performance
on our validation set.
"""
# Put the model into the evaluation mode. The dropout layers are disabled during
# the test time.
model.eval()
# Tracking variables
val_accuracy = []
val_loss = []
# For each batch in our validation set...
for batch in val_dataloader:
# Load batch to GPU
b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)
# Compute logits
with torch.no_grad():
logits = model(b_input_ids, b_attn_mask)
# Compute loss
loss = loss_fn(logits, b_labels)
val_loss.append(loss.item())
# Get the predictions
preds = torch.argmax(logits, dim=1).flatten()
# Calculate the accuracy rate
accuracy = (preds == b_labels).cpu().numpy().mean() * 100
val_accuracy.append(accuracy)
# Compute the average accuracy and loss over the validation set.
val_loss = np.mean(val_loss)
val_accuracy = np.mean(val_accuracy)
return val_loss, val_accuracy
def accuracy(probs, y_true):
"""
- Print AUC and accuracy on the test set
#params probs (np.array): an array of predicted probabilities with shape (len(y_true), 2)
#params y_true (np.array): an array of the true values with shape (len(y_true),)
fpr, tpr, threshold = roc_curve(y_true, preds)
roc_auc = auc(fpr, tpr)
print(f'AUC: {roc_auc:.4f}')
"""
preds = probs[:, 1]
# Get accuracy over the test set
y_pred = np.where(preds >= 0.5, 1, 0)
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
def bert_predict(model, test_dataloader):
"""Perform a forward pass on the trained BERT model to predict probabilities on the test set."""
# Put the model into the evaluation mode. The dropout layers are disabled during the test time.
model.eval()
all_logits = []
# For each batch in our test set...
for batch in test_dataloader:
# Load batch to GPU
b_input_ids, b_attn_mask = tuple(t.to(device) for t in batch)[:2]
# Compute logits
with torch.no_grad():
logits = model(b_input_ids, b_attn_mask)
all_logits.append(logits)
# Concatenate logits from each batch
all_logits = torch.cat(all_logits, dim=0)
# Apply softmax to calculate probabilities
probs = F.softmax(all_logits, dim=1).cpu().numpy()
return probs
set_seed(42) # Set seed for reproducibility
bert_classifier, optimizer, scheduler = initialize_model(epochs=3)
# start training
train(bert_classifier, train_dataloader, val_dataloader, epochs=3, evaluation=True)
# Compute predicted probabilities on the test set
probs = bert_predict(bert_classifier, val_dataloader)
# Evaluate the Bert classifier
accuracy(probs, y_val)

How to implement contractive autoencoder in Pytorch?

I'm trying to create a contractive autoencoder in Pytorch. I found this thread and tried according to that. This is the snippet I wrote based on the mentioned thread:
import datetime
import numpy as np
import torch
import torchvision
from torchvision import datasets, transforms
from torchvision.utils import save_image, make_grid
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
%matplotlib inline
dataset_train = datasets.MNIST(root='MNIST',
train=True,
transform = transforms.ToTensor(),
download=True)
dataset_test = datasets.MNIST(root='MNIST',
train=False,
transform = transforms.ToTensor(),
download=True)
batch_size = 128
num_workers = 2
dataloader_train = torch.utils.data.DataLoader(dataset_train,
batch_size = batch_size,
shuffle=True,
num_workers = num_workers,
pin_memory=True)
dataloader_test = torch.utils.data.DataLoader(dataset_test,
batch_size = batch_size,
num_workers = num_workers,
pin_memory=True)
def view_images(imgs, labels, rows = 4, cols =11):
imgs = imgs.detach().cpu().numpy().transpose(0,2,3,1)
fig = plt.figure(figsize=(8,4))
for i in range(imgs.shape[0]):
ax = fig.add_subplot(rows, cols, i+1, xticks=[], yticks=[])
ax.imshow(imgs[i].squeeze(), cmap='Greys_r')
ax.set_title(labels[i].item())
# now let's view some
imgs, labels = next(iter(dataloader_train))
view_images(imgs, labels,13,10)
class Contractive_AutoEncoder(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Linear(784, 512)
self.decoder = nn.Linear(512, 784)
def forward(self, input):
# flatten the input
shape = input.shape
input = input.view(input.size(0), -1)
output_e = F.relu(self.encoder(input))
output = F.sigmoid(self.decoder(output_e))
output = output.view(*shape)
return output_e, output
def loss_function(output_e, outputs, imgs, device):
output_e.backward(torch.ones(output_e.size()).to(device), retain_graph=True)
criterion = nn.MSELoss()
assert outputs.shape == imgs.shape ,f'outputs.shape : {outputs.shape} != imgs.shape : {imgs.shape}'
imgs.grad.requires_grad = True
loss1 = criterion(outputs, imgs)
print(imgs.grad)
loss2 = torch.mean(pow(imgs.grad,2))
loss = loss1 + loss2
return loss
epochs = 50
interval = 2000
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Contractive_AutoEncoder().to(device)
optimizer = optim.Adam(model.parameters(), lr =0.001)
for e in range(epochs):
for i, (imgs, labels) in enumerate(dataloader_train):
imgs = imgs.to(device)
labels = labels.to(device)
outputs_e, outputs = model(imgs)
loss = loss_function(outputs_e, outputs, imgs,device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i%interval:
print('')
print(f'epoch/epoechs: {e}/{epochs} loss : {loss.item():.4f} ')
For the sake of brevity I just used one layer for the encoder and the decoder. It should work regardless of number of layers in either of them obviously!
But the catch here is, aside from the fact that I don't know if this is the correct way of doing this, (calculating gradients with respect to the input), I get an error which makes the former solution wrong/not applicable.
That is:
imgs.grad.requires_grad = True
produces the error :
AttributeError : 'NoneType' object has no attribute 'requires_grad'
I also tried the second method suggested in that thread which is as follows:
class Contractive_Encoder(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Linear(784, 512)
def forward(self, input):
# flatten the input
input = input.view(input.size(0), -1)
output_e = F.relu(self.encoder(input))
return output_e
class Contractive_Decoder(nn.Module):
def __init__(self):
super().__init__()
self.decoder = nn.Linear(512, 784)
def forward(self, input):
# flatten the input
output = F.sigmoid(self.decoder(input))
output = output.view(-1,1,28,28)
return output
epochs = 50
interval = 2000
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_enc = Contractive_Encoder().to(device)
model_dec = Contractive_Decoder().to(device)
optimizer = optim.Adam([{"params":model_enc.parameters()},
{"params":model_dec.parameters()}], lr =0.001)
optimizer_cond = optim.Adam(model_enc.parameters(), lr = 0.001)
criterion = nn.MSELoss()
for e in range(epochs):
for i, (imgs, labels) in enumerate(dataloader_train):
imgs = imgs.to(device)
labels = labels.to(device)
outputs_e = model_enc(imgs)
outputs = model_dec(outputs_e)
loss_rec = criterion(outputs, imgs)
optimizer.zero_grad()
loss_rec.backward()
optimizer.step()
imgs.requires_grad_(True)
y = model_enc(imgs)
optimizer_cond.zero_grad()
y.backward(torch.ones(imgs.view(-1,28*28).size()))
imgs.grad.requires_grad = True
loss = torch.mean([pow(imgs.grad,2)])
optimizer_cond.zero_grad()
loss.backward()
optimizer_cond.step()
if i%interval:
print('')
print(f'epoch/epoechs: {e}/{epochs} loss : {loss.item():.4f} ')
but I face the error :
RuntimeError: invalid gradient at index 0 - got [128, 784] but expected shape compatible with [128, 512]
How should I go about this in Pytorch?
Summary
The final implementation for contractive loss that I wrote is as follows:
def loss_function(output_e, outputs, imgs, lamda = 1e-4, device=torch.device('cuda')):
criterion = nn.MSELoss()
assert outputs.shape == imgs.shape ,f'outputs.shape : {outputs.shape} != imgs.shape : {imgs.shape}'
loss1 = criterion(outputs, imgs)
output_e.backward(torch.ones(outputs_e.size()).to(device), retain_graph=True)
# Frobenious norm, the square root of sum of all elements (square value)
# in a jacobian matrix
loss2 = torch.sqrt(torch.sum(torch.pow(imgs.grad,2)))
imgs.grad.data.zero_()
loss = loss1 + (lamda*loss2)
return loss
and inside training loop you need to do:
for e in range(epochs):
for i, (imgs, labels) in enumerate(dataloader_train):
imgs = imgs.to(device)
labels = labels.to(device)
imgs.retain_grad()
imgs.requires_grad_(True)
outputs_e, outputs = model(imgs)
loss = loss_function(outputs_e, outputs, imgs, lam,device)
imgs.requires_grad_(False)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f'epoch/epochs: {e}/{epochs} loss: {loss.item():.4f}')
Full explanation
As it turns out and rightfully #akshayk07 pointed out in the comments, the implementation found in Pytorch forum was wrong in multiple places. The notable thing, being it wasn't implementing the actual contractive loss that was introduced in Contractive Auto-Encoders:Explicit Invariance During Feature Extraction paper! and also aside from that, the implementation wouldn't work at all for obvious reasons that will be explained in a moment.
The changes are obvious so I try to explain what's going on here. First of all note that imgs is not a leaf node, so the gradients would not be retained in the image .grad attribute.
In order to retain gradients for non leaf nodes, you should use retain_graph(). grad is only populated for leaf Tensors. Also imgs.retain_grad() should be called before doing forward() as it will instruct the autograd to store grads into non-leaf nodes.
Update
Thanks to #Michael for pointing out that the correct calculation of Frobenius Norm is actually (from ScienceDirect):
the square root of the sum of the squares of all the matrix entries
and not
the the square root of the sum of the absolute values of all the
matrix entries as explained here
In PyTorch 1.5.0, a high level torch.autograd.functional.jacobian API is added. This should make the contractive objective easier to implement for an arbitrary encoder. For torch>=v1.5.0, the contractive loss would look like this:
contractive_loss = torch.norm(torch.autograd.functional.jacobian(self.encoder, imgs, create_graph=True))
The create_graph argument makes the jacobian differentiable.
The main challenge in implementing the contractive autoencoder is in calculating the Frobenius norm of the Jacobian, which is the gradient of the code or bottleneck layer (vector) with respect to the input layer (vector). This is the regularization term in the loss function. Fortunately, you have done the hard work in solving this for me. Thank you! You are using MSE loss for the first term. Cross entropy loss is sometimes used instead. It's worth considering. I think you are almost there with the Frobenius norm, except that you need to take the square root of the sum of the squares of the Jacobian, where you are calculating the square root of the sum of the absolute values. Here's how I'd define the loss function (sorry I changed notation a little to keep myself straight):
def cae_loss_fcn(code, img_out, img_in, lamda=1e-4, device=torch.device('cuda')):
# First term in the loss function, for ensuring representational fidelity
criterion=nn.MSELoss()
assert img_out.shape == img_in.shape, f'img_out.shape : {img_out.shape} != img_in.shape : {img_in.shape}'
loss1 = criterion(img_out, img_in)
# Second term in the loss function, for enforcing contraction of representation
code.backward(torch.ones(code.size()).to(device), retain_graph=True)
# Frobenius norm of Jacobian of code with respect to input image
loss2 = torch.sqrt(torch.sum(torch.pow(img_in.grad, 2))) # THE CORRECTION
img_in.grad.data.zero_()
# Total loss, the sum of the two loss terms, with weight applied to second term
loss = loss1 + (lamda*loss2)
return loss

Categories

Resources