Why I can't overfit a single batch (the loss keeps oscilating)?

Why I can't overfit a single batch (the loss keeps oscilating)? - python

I am training simple feedforward neural network for regression task and I am just trying to overfit single batch of 32 examples (with 9 features) to see if the implementation is ok, however, the loss keeps oscilating no matter the learning rate and hidden size I try, it looks like this:
The data is standard scaled. The network is just having one hidden layer with ReLU:
BATCH_SIZE = 32
LR = 0.0001
NUM_EPOCHS = 100
HIDDEN_SIZE = 512
class FullyConnected(nn.Module):
def __init__(self, hidden_size=HIDDEN_SIZE):
super().__init__()
self.fc1 = nn.Linear(in_features=in_features, out_features=hidden_size)
self.fc2 = nn.Linear(in_features=hidden_size, out_features=1)
def forward(self, x):
out = self.fc1(x)
out = F.relu(out)
out = self.fc2(out)
return out
And the training loop is the usual one, except the fact that I am just overfitting for now a single batch:
model = FullyConnected().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=LR)
model.train()
X, y = next(iter(train_dataloader))
for epoch in range(NUM_EPOCHS):
y_pred = model(X)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Why I can't overfit a single batch and the loss is oscilating in a weird way despite the learning rate being small?

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

I am trying to use a pre-trained (resnet) model on the MNIST dataset, but this error always appears to me
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead.
This is my code:
MNIST dataset
from torchvision import datasets
import torchvision.transforms as transforms
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 100
# convert data to torch.FloatTensor
transform = transforms.Compose
([
transforms.ToTensor(),
transforms.RandomErasing(p=0.2)
])
# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False, download=True, transform=transform)
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
MLP network definition
from torch.nn.modules.activation import ReLU
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# building the model and the type to model → Sequential
self.model = nn.Sequential
(
# building the layers and the type to layers → Linear
nn.Linear(28 * 28 , 200), # input layer = 100
# to avoid problem to overfitting → using the Dropout (As we can see, dropouts are used to randomly remove neurons while training of the neural network.)
nn.Dropout(0.2), # to use Dropout to avoid problem → overfitting
nn.ReLU(True), # activation function
nn.BatchNorm1d(num_features = 200), # Batch normalization (also known as batch norm) is a method used
# to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling.
nn.Linear(200 , 10), # output layer = 10
)
def forward(self, x):
x = x.view(-1, 1, 28*28)
return self.model(x)
# initialize the N
model = Net()
print(model)
model = resnet50(weights = ResNet50_Weights.IMAGENET1K_V2)
model.fc = nn.Linear(512,10)
define an optimizer to update the model parameters
## Specify loss and optimization functions
# specify loss function
criterion = nn.CrossEntropyLoss()
# specify optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
Training Data
#number of epochs to train the model
n_epochs = 1 # suggest training between 20-50 epochs
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
###################
# train the model #
###################
for data, target in train_loader:
# clear the gradients of all optimized variables
data = data.repeat(1,3,1,1)
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
# print training statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch+1,
train_loss
))
Initialize lists to monitor test loss and accuracy
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model.eval() # prep model for *evaluation*
for data, target in test_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update test loss
test_loss += loss.item()*data.size(0)
# convert output probabilities to predicted class
_, pred = torch.max(output, 1)
# compare predictions to true label
correct = np.squeeze(pred.eq(target.data.view_as(pred)))
# calculate test accuracy for each object class
for i in range(16):
label = target.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
str(i), 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))

Difference between the calculation of the training loss and validation loss using pytorch

I wanna use the following code of this traditional image classification problem for my regression problem. The code can be found here:
GeeksforGeeks-Training Neural Networks with Validation using Pytorch
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.fc1 = nn.Linear(28*28, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 10)
def forward(self, x):
x = x.view(1,-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
model = Network()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
epochs = 5
for e in range(epochs):
train_loss = 0.0
model.train() # Optional when not using Model Specific layer
for data, labels in trainloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
optimizer.zero_grad()
target = model(data)
loss = criterion(target,labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
valid_loss = 0.0
model.eval() # Optional when not using Model Specific layer
for data, labels in validloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
target = model(data)
loss = criterion(target,labels)
valid_loss = loss.item() * data.size(0)
print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)} \t\t Validation Loss: {valid_loss / len(validloader)}')
I can understand why the training loss is summed up and then divided by the length of the training data in this example, but I can't get why the validation loss is also not summed up and divided by the length. If I understand correctly, the validation loss will be calculated here by using the validation loss of the last batch and then it is multiplied by the length of the batch size.
Is the calulation of the validation loss the correct way to do it? Can I use the code for my regression problem assuming I use regression-specific metrics (e.g. MSE instead of CrossEntropyLoss etc.)?

Yes, you can use the code for your regression task. The targets of the code example are one-hot vectors or in the MNIST example the numbers 0 to 9, which symbolize the classes. You would make a scalar out of that in the regression case. The loss function, which is the cross-entropy in the example, can be replaced by the MSE in your case.
I assume that the validation loss in this example is only estimated by extrapolating from a single data point to all other data points.
Since data.size represents the batch size, even averaging would only come out with the loss of that single data point.
However, on the web page, the validation loss is calculated over all data points in the validation set, as it should be done.

Good accuracy and loss on training vs bad accuracy on validation

I am learning pytorch and I have created binary classification algorithm. After having trained the model I have very low loss and quite good accuracy. However, on validation the accuracy is exactly 50%. I am wondering if I loaded samples incorrectly or the algorithm does not perform well.
Here you can find the plot of Training loss and accuracy.
Here is my training method:
epochs = 15
itr = 1
p_itr = 100
model.train()
total_loss = 0
loss_list = []
acc_list = []
for epoch in range(epochs):
for samples, labels in train_loader:
samples, labels = samples.to(device), labels.to(device)
optimizer.zero_grad()
output = model(samples)
labels = labels.unsqueeze(-1)
labels = labels.float()
loss = criterion(output, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
scheduler.step()
#if itr%p_itr == 0:
pred = torch.round(output)
correct = pred.eq(labels)
acc = torch.mean(correct.float())
print('[Epoch {}/{}] Iteration {} -> Train Loss: {:.4f}, Accuracy: {:.3f}'.format(epoch+1, epochs, itr, total_loss/p_itr, acc))
loss_list.append(total_loss/p_itr)
acc_list.append(acc)
total_loss = 0
itr += 1
Here, I am loading data from the path:
train_list_cats = glob.glob(os.path.join(train_cats_dir,'*.jpg'))
train_list_dogs = glob.glob(os.path.join(train_dogs_dir,'*.jpg'))
train_list = train_list_cats + train_list_dogs
val_list_cats = glob.glob(os.path.join(validation_cats_dir,'*.jpg'))
val_list_dogs = glob.glob(os.path.join(validation_dogs_dir,'*.jpg'))
val_list = val_list_cats + val_list_dogs
I am not attaching the model architecture, however I can add it if required.
I think that my training method is correct, although, I am not sure about training/validation data processing.
Edit:
The network params are as follow:
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
criterion = nn.BCELoss()
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[500,1000,1500], gamma=0.5)
Activation function is sigmoid.
The network architecture:
self.layer1 = nn.Sequential(
nn.Conv2d(3,16,kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=0.2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16,32, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=0.2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32,64, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=0.2)
)
self.fc1 = nn.Linear(17*17*64,512)
self.fc2 = nn.Linear(512,1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0),-1)
out = self.relu(self.fc1(out))
out = self.fc2(out)
return torch.sigmoid(out)

Going by your "Training loss and accuracy" plot your model is overfitting. Your train loss is near zero after 25 epochs and you continue training for 200+ epochs. This is wrong way to train a model. You should rather be doing early stopping based on the validation set. ie. Run one epoch of train and one epoch of eval and repeat. Stop when your train epoch is improving and the corresponding eval epoch is not improving.

Loss Function Not Decreasing in CNN

I am new to Pytorch and I'm training a model for binary classification of images. The images are currently stored as .npy files and I am loading them and training my model in batches. When I implement this, the loss function does not decrease. When I test the model on the training and test set again, the accuracy is constant at 50%. The data set is balanced.
I tried making the dataset smaller (around 125 for each class) and I still have the same problem. I expect the model to overfit the training set but this does not occur.
Please see my code below
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.conv1=nn.Conv2d(in_channels=2, out_channels=32, kernel_size=3)
self.conv2=nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3=nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.fc1=nn.Linear(in_features=128*6*6, out_features=1000)
self.fc2=nn.Linear(in_features=1000, out_features=100)
self.out=nn.Linear(in_features=100, out_features=2)
def forward(self,t):
POOL_stride=2
#Conv1
t=F.relu(self.conv1(t))
t=F.max_pool2d(t, kernel_size=2, stride=POOL_stride)
#Conv2
t=F.relu(self.conv2(t))
t=F.max_pool2d(t, kernel_size=2, stride=POOL_stride)
#Conv3
t=F.relu(self.conv3(t))
t=F.max_pool2d(t, kernel_size=2, stride=POOL_stride)
# dense 1
t=t.reshape(-1, 128*6*6)
t=self.fc1(t)
t=F.relu(t)
#dense 2
t=self.fc2(t)
t=F.relu(t)
t=self.out(t)
return t
def npy_loader(path):
sample = torch.from_numpy(np.load(path))
return sample
criterion=nn.CrossEntropyLoss()
optimizer = optim.Adam(self.model.parameters(), lr=0.003)
model = Network()
trainset = datasets.DatasetFolder(
root=train_dir,
loader=npy_loader,
extensions=['.npy']
)
train_loader = torch.utils.data.DataLoader(
trainset,
batch_size=batch_size,
shuffle=True,
)
for epoch in range(epochs):
running_loss = 0
batches = 0
for inputs, labels in train_loader:
batches = batches+1
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
output = model(inputs)
loss = criterion(output.squeeze(), labels.squeeze())
loss.backward()
optimizer.step()
running_loss += loss.item()
print('Loss :{:.4f} Epoch[{}/{}]'.format(running_loss/batches, epoch, epochs))
'''

You are providing parameters of some other self.model to optimizer while the model used for calculating the loss is different.
optimizer = optim.Adam(self.model.parameters(), lr=0.003)
model = Network()
Above is your sequence of defining optimizer and model. Notice that you are passing parameters of a different self.model to optimizer. Hence, optimizer.step() fails to update weights of desired model on which loss is being calculated. Instead it should be something like this:
model = Network()
optimizer = optim.Adam(model.parameters(), lr=0.003)
On another note, might I suggest that instead of returning 2 dimensional output from model, returning a 1-d output and using binary cross-entropy loss can also be explored, as your task is only a binary classification problem.

Pytorch vs. Keras: Pytorch model overfits heavily

For several days now, I'm trying to replicate my keras training results with pytorch. Whatever I do, the pytorch model will overfit far earlier and stronger to the validation set then in keras. For pytorch I use the same XCeption Code from https://github.com/Cadene/pretrained-models.pytorch.
The dataloading, the augmentation, the validation, the training schedule etc. are equivalent. Am I missing something obvious? There must be a general problem somewhere. I tried thousands of different module constellations, but nothing seems to come even close to the keras training. Can somebody help?
Keras model: val accuracy > 90%
# base model
base_model = applications.Xception(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# top model
x = base_model.output
x = GlobalMaxPooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(4, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# Compile model
from keras import optimizers
adam = optimizers.Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=adam, metrics=['accuracy'])
# LROnPlateau etc. with equivalent settings as pytorch
Pytorch model: val accuracy ~81%
from xception import xception
import torch.nn.functional as F
# modified from https://github.com/Cadene/pretrained-models.pytorch
class XCeption(nn.Module):
def __init__(self, num_classes):
super(XCeption, self).__init__()
original_model = xception(pretrained="imagenet")
self.features=nn.Sequential(*list(original_model.children())[:-1])
self.last_linear = nn.Sequential(
nn.Linear(original_model.last_linear.in_features, 512),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(512, num_classes)
)
def logits(self, features):
x = F.relu(features)
x = F.adaptive_max_pool2d(x, (1, 1))
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x
def forward(self, input):
x = self.features(input)
x = self.logits(x)
return x
device = torch.device("cuda")
model=XCeption(len(class_names))
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
criterion = nn.CrossEntropyLoss(size_average=False)
optimizer = optim.Adam(model.parameters(), lr=0.0001)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)
Thank you very much!
Update:
Settings:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)
model = train_model(model, train_loader, val_loader,
criterion, optimizer, scheduler,
batch_size, trainmult=8, valmult=10,
num_epochs=200, epochs_top=0)
Cleaned training function:
def train_model(model, train_loader, val_loader, criterion, optimizer, scheduler, batch_size, trainmult=1, valmult=1, num_epochs=None, epochs_top=0):
for epoch in range(num_epochs):
for phase in ['train', 'val']:
running_loss = 0.0
running_acc = 0
total = 0
# Iterate over data.
if phase=="train":
model.train(True) # Set model to training mode
for i in range(trainmult):
for data in train_loader:
# get the inputs
inputs, labels = data
inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
# zero the parameter gradients
optimizer.zero_grad()
# forward
outputs = model(inputs) # notinception
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
loss.backward()
optimizer.step()
# statistics
total += labels.size(0)
running_loss += loss.item()*labels.size(0)
running_acc += torch.sum(preds == labels)
train_loss=(running_loss/total)
train_acc=(running_acc.double()/total)
else:
model.train(False) # Set model to evaluate mode
with torch.no_grad():
for i in range(valmult):
for data in val_loader:
# get the inputs
inputs, labels = data
inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
# zero the parameter gradients
optimizer.zero_grad()
# forward
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels.data)
# statistics
total += labels.size(0)
running_loss += loss.item()*labels.size(0)
running_acc += torch.sum(preds == labels)
val_loss=(running_loss/total)
val_acc=(running_acc.double()/total)
scheduler.step(val_loss)
return model

it may be because type of weight initialization you are using
otherwise this should not happen
try with same initializer in both the models

self.features=nn.Sequential(*list(original_model.children())[:-1])
Are you sure that this line re-instantiates your model in exactly the same way? You're using a NN.Sequential instead of the original XCeption model's forward function. If there's anything in that forward function that isn't the exact same as using a nn.Sequential, it will not reproduce the same performance.
Instead of wrapping it in a Sequential, you could just change this
my_model = Xception()
# load weights before you change the architecture
my_model = load_weights(path_to_weights)
# overwrite the original's last_linear with your own
my_model.last_linear = nn.Sequential(
nn.Linear(original_model.last_linear.in_features, 512),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(512, num_classes)
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why I can't overfit a single batch (the loss keeps oscilating)? - python

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

Difference between the calculation of the training loss and validation loss using pytorch

Good accuracy and loss on training vs bad accuracy on validation

Loss Function Not Decreasing in CNN

Pytorch vs. Keras: Pytorch model overfits heavily

Categories

Resources