Training Loss decreasing but Validation Loss is stable

Training Loss decreasing but Validation Loss is stable - python

I am trying to train a neural network I took from this paper https://scholarworks.rit.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=10455&context=theses. See this image: Neural Network Architechture
I am using pytorch-lightning to use multi-GPU training.
I am feeding this network 3-channel optical flows (UVC: U is horizontal temporal displacement, V is vertical temporal displacement, C represents the confidence map).
Ouputs represent the frame to frame pose and they are in the form of a vector of 6 floating values ( translationX, tanslationY, translationZ, Yaw, Pitch, Roll). Translations vary from -0.25 to 3 in meters and rotations vary from -6 to 6 in degrees.
Outputs dataset is taken from kitti-odometry dataset, there is 11 video sequences, I used the first 8 for training and a portion of the remaining 3 sequences for evaluating during training.
I trained the model for 200 epochs ( took 33 hours on 8 GPUs ).
During this training, training loss decreases but validation loss remains constant during the whole training process.
transform = transforms.Compose(
[cv_resize((370,1242)),
flow_transform_and_uint8_and_tensor(),
transforms.Normalize((0.3973, 0.2952, 0.4500), (0.4181, 0.4362, 0.3526))])
batch_size = 8
val_data_percentage = 0.06
epochs = 200
learning_rate = 0.0001
train_dataset = FlowsAndPoses("./uvc_flows_png/train/", "./relative_poses/train/", transform)
test_dataset = FlowsAndPoses("./uvc_flows_png/test/", "./relative_poses/test/", transform)
dataset_length = test_dataset.__len__()
test_dataset, val_dataset = random_split(test_dataset,[int(dataset_length*(1-val_data_percentage)),dataset_length - int(dataset_length*(1-val_data_percentage))])
print("Train: ",train_dataset.__len__(), " Validation: ", val_dataset.__len__())
criterion = nn.L1Loss()
class Net(pl.LightningModule):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 64, 7, 2)
self.conv2 = nn.Conv2d(64, 128, 5, 2)
self.conv3 = nn.Conv2d(128, 256, 5, 2)
self.conv4 = nn.Conv2d(256, 256, 3, 1)
self.conv5 = nn.Conv2d(256, 512, 3, 2)
self.conv6 = nn.Conv2d(512, 512, 3, 1)
self.conv7 = nn.Conv2d(512, 512, 3, 2)
self.conv8 = nn.Conv2d(512, 512, 3, 1)
self.conv9 = nn.Conv2d(512, 1024, 3, 2)
self.fc1 = nn.Linear(32768, 1024)
self.drop = nn.Dropout(0.5)
self.fc2 = nn.Linear(1024, 6)
self.net_relu = nn.LeakyReLU(0.1)
def forward(self, x):
x = self.net_relu(self.conv1(x))
x = self.net_relu(self.conv2(x))
x = self.net_relu(self.conv3(x))
x = self.net_relu(self.conv4(x))
x = self.net_relu(self.conv5(x))
x = self.net_relu(self.conv6(x))
x = self.net_relu(self.conv7(x))
x = self.net_relu(self.conv8(x))
x = self.net_relu(self.conv9(x))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = self.net_relu(self.fc1(x))
x = self.drop(x)
x = self.fc2(x)
return x
def training_step(self, batch, batch_idx):
running_loss = 0
print("Training: ")
inputs, labels = batch
outputs = self.forward(inputs.float())
loss = criterion(outputs, labels.float())
self.log("my_loss", loss, on_epoch=True)
return loss
def training_epoch_end(self, training_step_outputs):
training_loss_file = open("losses/training_loss"+str(self.current_epoch)+"_"+str(self.global_step), "w")
training_loss_file.write(str(training_step_outputs))
training_loss_file.close()
try:
torch.save(self.state_dict(), "checkpoints/trained_model_epoch"+str(self.current_epoch)+".pth")
except:
print("error saving")
def validation_step(self, batch, batch_idx):
inputs, labels = batch
outputs = self.forward(inputs.float())
loss = criterion(outputs, labels.float())
self.log("val_loss", loss)
return loss
def validation_epoch_end(self, validation_step_outputs):
valid_loss_file = open("losses/validation_loss"+str(self.current_epoch)+"_"+str(self.global_step), "w")
valid_loss_file.write(str(validation_step_outputs))
valid_loss_file.close()
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=learning_rate)
return optimizer
autoencoder = Net()
trainer = pl.Trainer(gpus=[0,1,2,3,4,5,6,7], accelerator="gpu", strategy="ddp", enable_checkpointing=True, max_epochs=epochs, check_val_every_n_epoch=1)
trainer.fit(autoencoder, DataLoader(train_dataset, batch_size=batch_size, shuffle=True), DataLoader(val_dataset, batch_size=batch_size, shuffle=True))
Zero Grad and optimizer.step are handled by the pytorch-lightning library.
The results I got are in the following images:
Training loss
Validation loss during training
If anyone has suggestions on how to address this problem, I would really apreciate it.

Related

Autoencoder very weird loss spikes when training

Introduction:
I am trying to make an autoencoder learn 32 features like position, velocity, etc in 32 time steps => 32x32 ‘image’.
For this I just made a simple linear model that uses in every layer the Tanh function with an encoder and a decoder that are symmetric.
During training, I added my own version of dropout for just the input. (in the future I will use the nn.Dropout)
Problem:
I get large spikes in loss function “sqrt(MSE)” at irregular intervals. (Batch_Size = 6000)
Loss Graph
What I have tried: (small test, 1000 epochs max)
clip_grad_norm_(model.parameters(), max_norm = 0.5).
Tried ReLu and ELU.
Activation function Batch = N / 2 (I wanted to do N but the memory of my gpu was not enough).
Not adding noise or dropout (the noise/dropout I think helps but does not solve the problem).
Remove the square root on the MSE loss.
Can someone explain to me why this happens and how to fix it?
def rand_bin_array(p_zeros, shape):
size = 1
for e in shape:
size *= e
arr = np.ones(size)
arr[:int(size * p_zeros)] = 0
np.random.shuffle(arr)
arr = arr.reshape(shape)
return arr
class Autoencoder_Liniar(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(1024, 921),
nn.Tanh(),
nn.Linear(921, 736),
nn.Tanh(),
nn.Linear(736, 515),
nn.Tanh(),
nn.Linear(515, 309),
nn.Tanh(),
nn.Linear(309, 128),
nn.Tanh(),
nn.Linear(128, 64),
nn.Tanh(),
)
self.decoder = nn.Sequential(
nn.Linear(64, 128),
nn.Tanh(),
nn.Linear(128, 309),
nn.Tanh(),
nn.Linear(309, 515),
nn.Tanh(),
nn.Linear(515, 736),
nn.Tanh(),
nn.Linear(736, 921),
nn.Tanh(),
nn.Linear(921, 1024),
nn.Tanh()
)
def forward(self, x):
enc = self.encoder(x)
dec = self.decoder(enc)
return dec
torch.manual_seed(0)
model = Autoencoder_Liniar().cuda()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
random.seed(0)
epochs = 10000
batch_size = 6000
test_b_size = 5000
train_losses = []
test_losses = []
for i in range(epochs):
avg_loss = 0
random.shuffle(train_data)
for b in range(train_nr // batch_size):
start = b * batch_size
data = torch.FloatTensor(train_data[start : start + batch_size]).cuda()
noise_power = max(0.8 - i/epochs, 0.1)
noise = torch.FloatTensor(rand_bin_array(noise_power, data.shape)).cuda()
y_pred = model(data * noise)
loss = torch.sqrt(criterion(y_pred, data))
optimizer.zero_grad()
loss.backward()
#torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=0.5)
optimizer.step()
avg_loss += loss.item()
if b % 20 == 0:
print(f'EPOCH: {i} BATCH: {b} LOSS: {loss.item()}')
train_losses.append(avg_loss / (train_nr // batch_size))
with torch.no_grad():
avg_loss = 0
for b in range(test_nr // test_b_size):
start = b * test_b_size
data = np.array(test_data[start : start + test_b_size])
data = torch.FloatTensor(data).cuda()
y_pred = model(data)
loss = torch.sqrt(criterion(y_pred, data))
avg_loss += loss.item()
test_losses.append(avg_loss / (test_nr // test_b_size))
Added code for getting gradient's norm over epochs graph (without noise/dropout)
Gradient clipped at 0.3
total_norm = 0
for p in model.parameters():
param_norm = p.grad.detach().data.norm(2)
total_norm += param_norm.item() ** 2
total_norm = total_norm ** 0.5
avg_grad += total_norm
optimizer.step()

The answer was clipping the gradient with the clip_grad_norm_, but at a lower value.
The value was decided after making the gradient's norm over epochs graph.

PyTorch Error while building CNN: "1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 2, 64, 64]"

I want to build a CNN like the one in this paper: https://arxiv.org/abs/1603.08511 (https://richzhang.github.io/colorization/ ).
As data I got images from the LAB - color space. I wrote a data loader for these l and a, b values and give the l values as input to my Neural Network and the a, b values as label.
I get an error "1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 2, 64, 64]" in the criterion loss function.
There is a problem with what I am inserting as "label" into the criterion() method. But the dimensions of the label seem right to me: [1, 2, 64, 64] --> [batch_size, in_channels (a,b), width, heigth].
I set the batch_size to 1 to just see if it's working. I tried to just cut of the batch_size dimension using pytorch.squeeze(), but it didn't work. I don't understand why I can't put in a vector of this shape and size to the criterion() function. Any help is appreciated! My code is below:
#importing the libraries
import numpy as np
import pandas as pd
from numpy import random
# for creating validation set
from sklearn.model_selection import train_test_split
# PyTorch libraries and modules
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torch.autograd import Variable
from torch.nn import Linear, ReLU, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, BatchNorm2d, Dropout
from torch.optim import Adam, SGD
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torch.utils.data.sampler import SubsetRandomSampler
from typing import Any, Tuple
# set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# define local paths
L_path = 'l/gray_scale.npy'
ab1_path = 'ab/ab/ab1.npy'
ab2_path = 'ab/ab/ab2.npy'
ab3_path = 'ab/ab/ab3.npy'
image_size = 64
class ColorDataset(Dataset):
def __init__(self, transformations=None, seed=42) -> None:
if transformations is None:
self.transformations = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize(image_size),
transforms.ToTensor()
])
else:
self.transformations = transformations
self.seed = seed
self.L = np.load(L_path)
self.L = np.expand_dims(self.L, -1)
# self.L = self.L.transpose((0, 3, 1, 2))
self.ab = np.concatenate([
np.load(ab1_path),
np.load(ab2_path),
np.load(ab3_path)
], axis=0)
# self.ab = self.ab.transpose((0, 3, 1, 2))
print("All inputs loaded")
def __len__(self) -> int:
return len(self.L)
def __getitem__(self, index: int) -> Tuple[Any, Any]:
random.seed(self.seed)
L = self.transformations(self.L[index])
random.seed(self.seed)
ab = self.transformations(self.ab[index])
return L, ab
# initialize dataset
dataset = ColorDataset()
dataset_size = len(dataset)
# set relative test size (for split)
test_size = 0.3
indices = list(range(dataset_size))
np.random.shuffle(indices)
split = int(np.floor(test_size * dataset_size))
train_index, test_index = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_index)
test_sampler = SubsetRandomSampler(test_index)
# set batch size
batch_size = 1
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=train_sampler, num_workers=0)
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=test_sampler, num_workers=0)
# Network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1_1 = nn.Conv2d(1, 64, 1)
self.conv1_2 = nn.Conv2d(64, 64, 1)
self.batch_norm_1 = nn.BatchNorm2d(64)
self.conv2_1 = nn.Conv2d(64, 128, 1, 2)
self.conv2_2 = nn.Conv2d(128, 128, 1)
self.batch_norm_2 = nn.BatchNorm2d(128)
self.conv3_1 = nn.Conv2d(128, 256, 1, 2)
self.conv3_2 = nn.Conv2d(256, 256, 1)
self.conv3_3 = nn.Conv2d(256, 256, 1)
self.batch_norm_3 = nn.BatchNorm2d(256)
self.conv4_1 = nn.Conv2d(256, 512, 1, 2)
self.conv4_2 = nn.Conv2d(512, 512, 1)
self.conv4_3 = nn.Conv2d(512, 512, 1)
self.batch_norm_4 = nn.BatchNorm2d(512)
self.conv5_1 = nn.Conv2d(512, 512, 1)
self.conv5_2 = nn.Conv2d(512, 512, 1)
self.conv5_3 = nn.Conv2d(512, 512, 1)
self.batch_norm_5 = nn.BatchNorm2d(512)
self.conv6_1 = nn.Conv2d(512, 512, 1)
self.conv6_2 = nn.Conv2d(512, 512, 1)
self.conv6_3 = nn.Conv2d(512, 512, 1)
self.batch_norm_6 = nn.BatchNorm2d(512)
self.conv7_1 = nn.Conv2d(512, 256, 1)
self.conv7_2 = nn.Conv2d(256, 256, 1)
self.conv7_3 = nn.Conv2d(256, 256, 1)
self.batch_norm_7 = nn.BatchNorm2d(256)
self.conv8_1 = nn.Conv2d(256, 128, 1)
self.conv8_2 = nn.Conv2d(128, 128, 1, 1)
self.conv8_3 = nn.Conv2d(128, 128, 1)
#define forward pass
def forward(self, x):
# Pass data through conv1_1
x = self.conv1_1(x)
# Use the rectified-linear activation function over x
x = F.relu(x)
x = self.conv1_2(x)
x = F.relu(x)
#batch normalization
x = self.batch_norm_1(x)
x = self.conv2_1(x)
x = F.relu(x)
x = self.conv2_2(x)
x = F.relu(x)
#batch normalization
x = self.batch_norm_2(x)
x = self.conv3_1(x)
x = F.relu(x)
x = self.conv3_2(x)
x = F.relu(x)
x = self.conv3_3(x)
#batch normalization
x = self.batch_norm_3(x)
x = self.conv4_1(x)
x = F.relu(x)
x = self.conv4_2(x)
x = F.relu(x)
x = self.conv4_3(x)
#batch normalization
x = self.batch_norm_4(x)
x = self.conv5_1(x)
x = F.relu(x)
x = self.conv5_2(x)
x = F.relu(x)
x = self.conv5_3(x)
#batch normalization
x = self.batch_norm_5(x)
x = self.conv6_1(x)
x = F.relu(x)
x = self.conv6_2(x)
x = F.relu(x)
x = self.conv6_3(x)
#batch normalization
x = self.batch_norm_6(x)
x = self.conv7_1(x)
x = F.relu(x)
x = self.conv7_2(x)
x = F.relu(x)
x = self.conv7_3(x)
#batch normalization
x = self.batch_norm_7(x)
x = self.conv8_1(x)
x = F.relu(x)
x = self.conv8_2(x)
x = F.relu(x)
x = self.conv8_3(x)
return x
model = Net()
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
#labels und output des netzwerks in criterion
def train(epoch):
model.train()
train_loss = 0
# train the model
model.train() # prep model for training
for data, label in train_loader:
data = data.to('cuda')
label = label.to('cuda')
print(label.size())
# clear the gradients of all optimized variables
optimizer.zero_grad()
#forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
label = label.long() #convert label to long since in criterion long is expected
loss = criterion(output, label) #l value is data, ab values are labels
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item() * data.size(0)
# calculate average loss over an epoch
train_loss = train_loss / len(train_loader.sampler)
# printing the loss
print('Epoch : ', epoch+1, '\t', 'loss :', train_loss)
# defining number of epochs
n_epochs = 1
# empty list to store training losses (actually not used)
train_losses = []
# training the model
for epoch in range(n_epochs):
train(epoch)
and here the Error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-25-c26ffccd8f7e> in <module>
250 # training the model
251 for epoch in range(n_epochs):
--> 252 train(epoch)
<ipython-input-25-c26ffccd8f7e> in train(epoch)
224 label = label.long() #convert label to long since in criterion long is expected
225 print(label.size())
--> 226 loss = criterion(output, label) #l value is data, ab values are labels
227 # backward pass: compute gradient of the loss with respect to model parameters
228 loss.backward()
~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~\anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
960 def forward(self, input: Tensor, target: Tensor) -> Tensor:
961 return F.cross_entropy(input, target, weight=self.weight,
--> 962 ignore_index=self.ignore_index, reduction=self.reduction)
963
964
~\anaconda3\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2466 if size_average is not None or reduce is not None:
2467 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2468 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2469
2470
~\anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2264 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2265 elif dim == 4:
-> 2266 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2267 else:
2268 # dim == 3 or dim > 4
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 2, 64, 64]

Why are you using CrossEntropyLoss() for this task?
Why does your network outputs 128dim "pixels"?
What are the spatial dimensions of your prediction vs the targets?
You are using only 1x1 kernels, meaning your model makes its predictions for each pixel independantly of its neighbors. How do you expect it to learn to predict meaningful colorizations?
You should go back to the drawing table and rethink your model and your criterion. Right now this error is the least of your worries.

Expected hidden[0] size (2, 8, 256), got [8, 256]

I have correct shape of hidden layer for printing as below.
print(h0.shape)
print(x.shape)
torch.Size([2, 8, 256])
torch.Size([8, 300, 300])
But I still have error as Expected hidden[0] size (2, 8, 256), got [8, 256]
What could be wrong?
The whole code is as follows.
import torch
import torch.nn as nn
import torchvision
import matplotlib.pyplot as plt
import torchvision.transforms as tt
from torchvision.datasets import ImageFolder
from PIL import Image
import numpy as np
from torch.autograd import Variable
seq_len = input_size
hidden_size = 256 #size of hidden layers
num_classes = 5
num_epochs = 20
batch_size = 8
learning_rate = 0.001
# Fully connected neural network with one hidden layer
num_layers = 2 # 2 RNN layers are stacked
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(LSTM, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)#batch must have first dimension
#our inpyt needs to have shape
#x -> (batch_size, seq, input_size)
self.fc = nn.Linear(hidden_size, num_classes)#this fc is after RNN. So needs the last hidden size of RNN
def forward(self, x):
#according to ducumentation of RNN in pytorch
#rnn needs input, h_0 for inputs at RNN (h_0 is initial hidden state)
#the following one is initial hidden layer
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)#first one is number of layers and second one is batch size
#output has two outputs. The first tensor contains the output features of the hidden last layer for all time steps
#the second one is hidden state f
print(h0.shape)
print(x.shape)
out, _ = self.lstm(x, h0)
print(out.shape)
#output has batch_size, seq_len, hidden size
#we need to decode hidden state only the last time step
#out (N, 30, 128)
#Since we need only the last time step
#Out (N, 128)
out = out[:, -1, :] #-1 for last time step, take all for N and 128
out = self.fc(out)
return out
stacked_lstm_model = LSTM(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()#cross entropy has softmax at output
optimizer = torch.optim.Adam(stacked_lstm_model.parameters(), lr=learning_rate) #optimizer used gradient optimization using Adam
# Train the model
n_total_steps = len(train_dl)
for epoch in range(num_epochs):
t_losses=[]
for i, (images, labels) in enumerate(train_dl):
# origin shape: [8, 1, 300, 300]
# resized: [8, 300, 300]
images = images.reshape(-1, seq_len, input_size).to(device)
labels = labels.to(device)
# Forward pass
outputs = stacked_lstm_model(images)
loss = criterion(outputs, labels)
t_losses.append(loss)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print (f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{n_total_steps}], Loss: {loss.item():.4f}')
avgd_trainloss = sum(t_losses)/len(t_losses)
acc=0
v_losses=[]
with torch.no_grad():
n_correct = 0
n_samples = 0
for v_images, v_labels in valid_dl:
v_images = v_images.reshape(-1, seq_len, input_size).to(device)
v_labels = v_labels.to(device)
v_outputs = stacked_lstm_model(v_images)
v_loss = criterion(v_outputs, v_labels)
v_losses.append(v_loss)
# max returns (value ,index)
_, v_predicted = torch.max(v_outputs.data, 1)
n_samples += v_labels.size(0)
n_correct += (v_predicted == v_labels).sum().item()
acc = 100.0 * n_correct / n_samples
avgd_validloss = sum(v_losses)/len(v_losses)
print (f'Epoch [{epoch+1}/{num_epochs}], Train loss: {avgd_trainloss.item():.4f}, Valid loss: {avgd_validloss.item():.4f}, Valid accu: {acc.item():.2f}')
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
n_correct = 0
n_samples = 0
for images, labels in test_dl:
images = images.reshape(-1, seq_len, input_size).to(device)
labels = labels.to(device)
outputs = stacked_lstm_model(images)
# max returns (value ,index)
_, predicted = torch.max(outputs.data, 1)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()
acc = 100.0 * n_correct / n_samples
print(f'Accuracy of the network on test images: {acc} %')

The LSTM requires two hidden states, not one. So instead of
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
use
h0 = (torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device), torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device))
So you need two hidden states in a tuple.

What exactly does the forward function output in Pytorch?

This example is taken verbatim from the PyTorch Documentation. Now I do have some background on Deep Learning in general and know that it should be obvious that the forward call represents a forward pass, passing through different layers and finally reaching the end, with 10 outputs in this case, then you take the output of the forward pass and compute the loss using the loss function one defined. Now, I forgot what exactly the output from the forward() pass yields me in this scenario.
I thought that the last layer in a Neural Network should be some sort of activation function like sigmoid() or softmax(), but I did not see these being defined anywhere, furthermore, when I was doing a project now, I found out that softmax() is called later on. So I just want to clarify what exactly is the outputs = net(inputs) giving me, from this link, it seems to me by default the output of a PyTorch model's forward pass is logits?
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
print(outputs)
break
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')

it seems to me by default the output of a PyTorch model's forward pass
is logits
As I can see from the forward pass, yes, your function is passing the raw output
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
So, where is softmax? Right here:
criterion = nn.CrossEntropyLoss()
It's a bit masked, but inside this function is handled the softmax computation which, of course, works with the raw output of your last layer
This is softmax calculation:
where z_i are the raw outputs of the neural network
So, in conclusion, there is no activation function in your last input because it's handled by the nn.CrossEntropyLoss class
Answering what's the raw output that comes from nn.Linear: The raw output of a neural network layer is the linear combination of the values that come from the neurons of the previous layer

Loss function used for multi-dimensional feature mapping

I am working on a video animation project using PyTorch. My dataset contains 3904x60 mfcc audio features(input) and corresponding 3904x3 video features(output). The goal is to train a neural network model such that given an unknown audio feature, the model maps it into its corresponding video feature. In other words, the neural network performs a 60 to 3 feature mapping. I have already built the neural network following this tutorial:
class ConvNet(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv1d(1, 32, kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(
nn.Conv1d(32, 64, kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2))
self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(15 * 64, 1000)
self.fc2 = nn.Linear(1000, 3)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.drop_out(out)
out = self.fc1(out)
out = self.fc2(out)
return out
and my training code looks like:
model = ConvNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
for i, (a, v) in enumerate(train_loader):
# Run the forward pass
a = a.float()
v = v.long()
outputs = model(a.view(a.size(0),1,a.size(1)))
loss = criterion(outputs, v)
loss_list.append(loss.item())
# Backprop and perform Adam optimisation
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Track the accuracy
total = labels.size(0)
_, predicted = torch.max(outputs.data, 1)
correct = (predicted == labels).sum().item()
acc_list.append(correct / total)
if (i + 1) % 100 == 0:
print('Epoch[{}/{}],Step[{}/{}],Loss{:.4f},Accuracy{:.2f}%'
.format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),
(correct / total) * 100))
but received an error in training:
---> 15 loss = criterion(outputs, v)
multi-target not supported at /Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/THNN/generic/ClassNLLCriterion.c:21
I defined the batch size to be 4 so each a and v in the iteration should be a 4 by 60 tensor and a 4 by 3 tensor, respectively. How do I solve this problem?

The issue could be because of the definition of the target function that you use for nn.CrossEntropyLoss(). v is a 4 x 3 tensor you say, which doesn't appear correct.
In loss = criterion(outputs, v) , the loss function expects v to be a tensor of size minibatch with each value depicting on of the C classes (i.e. 0 to C-1). See the 'Shape' tab in https://pytorch.org/docs/stable/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss
Target: (N) where each value is 0≤targets[i]≤C−1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Training Loss decreasing but Validation Loss is stable - python

Related

Autoencoder very weird loss spikes when training

PyTorch Error while building CNN: "1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 2, 64, 64]"

Expected hidden[0] size (2, 8, 256), got [8, 256]

What exactly does the forward function output in Pytorch?

Loss function used for multi-dimensional feature mapping

Categories

Resources