Why the Validation Loss is too high?

Why the Validation Loss is too high? - python

I tried to write a code which is about the brand detection. But while training the model, there are high losses in every epoch. I tried to normalize the dataset, however nothings changed. Am I doing something wrong?
My code is as below:
train_link = "C:/Users\proin\OneDrive\Masaüstü\Data_2/train"
test_link = "C:/Users\proin\OneDrive\Masaüstü\Data_2/test"
val_link = "C:/Users\proin\OneDrive\Masaüstü\Data_2/validation"
transforming_train = transforms.Compose([transforms.Resize((300, 300)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
transforming_test = transforms.Compose([transforms.Resize((300, 300)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
I did also try to plug mean and std values in normalize function but nothings changed.
Let me continue:
trainset = torchvision.datasets.ImageFolder(train_link, transform = transforming_train)
testset = torchvision.datasets.ImageFolder(test_link, transform = transforming_test)
valset = torchvision.datasets.ImageFolder(val_link, transform = transforming_test)
batch_size = 1
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size,
shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=1,
  shuffle=False)
Because ImageFolder has no argument of normalize, I couldn't plug it in here.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
And I'm using the CUDA as well.
def train_log_loss_network(model, train_loader, val_loader=None, epochs=50, device="cpu"):
loss_fn = nn.CrossEntropyLoss() #CrossEntropy is another name for the Logistic Regression loss function. Like before, we phrase learning as minimize a loss function. This is the loss we are going to minimize!
#We need an optimizer! Adam is a good default one that works "well enough" for most problems
#To tell Adam what to optimize, we give it the model's parameters - because thats what the learning will adjust
# optimizer = torch.optim.Adam(model.parameters())
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
#Devices can be spcified by a string, or a special torch object
#If it is a string, lets get the correct device
if device.__class__ == str:
device = torch.device(device)
model.to(device)#Place the model on the correct compute resource
for epoch in range(epochs):
model = model.train()#Put our model in training mode
running_loss = 0.0
for inputs, labels in train_loader: #tqdm(train_loader):
#Move the batch to the device we are using.
inputs, labels = inputs.cuda(), labels.cuda()
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
y_pred = model(inputs)
# Compute loss.
loss = loss_fn(y_pred, labels.long())
# Backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its parameters
optimizer.step()
running_loss += loss.item() * inputs.size(0)
if val_loader is None:
print("Loss after epoch {} is {}".format(epoch + 1, running_loss))
else:#Lets find out validation performance as we go!
model = model.eval() #Set the model to "evaluation" mode, b/c we don't want to make any updates!
predictions = []
targets = []
for inputs, labels in val_loader:
#Move the batch to the device we are using.
inputs = inputs.to(device)
labels = labels.to(device)
y_pred = model(inputs)
# Get predicted classes
# y_pred will have a shape (Batch_size, C)
#We are asking for which class had the largest response along dimension #1, the C dimension
for pred in torch.argmax(y_pred, dim=1).cpu().numpy():
predictions.append(pred)
for l in labels.cpu().numpy():
targets.append(l)
#print("Network Accuracy: ", )
print("Loss after epoch {} is {}. Accuracy: {}".format(epoch + 1, running_loss, accuracy_score(predictions, targets)))
And lastly,
train_log_loss_network(model, trainloader, val_loader=valloader, epochs=10, device=device)
Even I tried different epoch number and different conv layer, the results are similar.
Loss after epoch 1 is 72568.83042097092. Accuracy: 0.0036231884057971015
Loss after epoch 2 is 72568.78793954849. Accuracy: 0.0036231884057971015
Loss after epoch 3 is 72568.74511051178. Accuracy: 0.0036231884057971015
Loss after epoch 4 is 72568.7018828392. Accuracy: 0.014492753623188406
Loss after epoch 5 is 72568.65722990036. Accuracy: 0.014492753623188406

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

I am trying to use a pre-trained (resnet) model on the MNIST dataset, but this error always appears to me
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead.
This is my code:
MNIST dataset
from torchvision import datasets
import torchvision.transforms as transforms
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 100
# convert data to torch.FloatTensor
transform = transforms.Compose
([
transforms.ToTensor(),
transforms.RandomErasing(p=0.2)
])
# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False, download=True, transform=transform)
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
MLP network definition
from torch.nn.modules.activation import ReLU
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# building the model and the type to model → Sequential
self.model = nn.Sequential
(
# building the layers and the type to layers → Linear
nn.Linear(28 * 28 , 200), # input layer = 100
# to avoid problem to overfitting → using the Dropout (As we can see, dropouts are used to randomly remove neurons while training of the neural network.)
nn.Dropout(0.2), # to use Dropout to avoid problem → overfitting
nn.ReLU(True), # activation function
nn.BatchNorm1d(num_features = 200), # Batch normalization (also known as batch norm) is a method used
# to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling.
nn.Linear(200 , 10), # output layer = 10
)
def forward(self, x):
x = x.view(-1, 1, 28*28)
return self.model(x)
# initialize the N
model = Net()
print(model)
model = resnet50(weights = ResNet50_Weights.IMAGENET1K_V2)
model.fc = nn.Linear(512,10)
define an optimizer to update the model parameters
## Specify loss and optimization functions
# specify loss function
criterion = nn.CrossEntropyLoss()
# specify optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
Training Data
#number of epochs to train the model
n_epochs = 1 # suggest training between 20-50 epochs
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
###################
# train the model #
###################
for data, target in train_loader:
# clear the gradients of all optimized variables
data = data.repeat(1,3,1,1)
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
# print training statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch+1,
train_loss
))
Initialize lists to monitor test loss and accuracy
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model.eval() # prep model for *evaluation*
for data, target in test_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update test loss
test_loss += loss.item()*data.size(0)
# convert output probabilities to predicted class
_, pred = torch.max(output, 1)
# compare predictions to true label
correct = np.squeeze(pred.eq(target.data.view_as(pred)))
# calculate test accuracy for each object class
for i in range(16):
label = target.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
str(i), 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))

Very high validation loss/small train loss in Pytorch, while finetuning resnet 50

I am training model to classify 2 types of images. I have decided to take a transfer-learning approach, freeze every part of resnet50 and new layer and start finetuning process. My dataset is not perfectly balanced but i used weights for that purpose.Please take a look at validation loss vs training loss graph. It seems to be extremely inconsitent. Could you please take a look at my code? I am new to Pytorch, maybe there is something wrong with my method and code. Final accuracy tested on test set is 86%. Thank you!
learning_rate = 1e-1
num_epochs = 100
patience = 10
batch_size = 100
weights = [4, 1]
model = models.resnet50(pretrained=True)
# Replace last layer
num_features = model.fc.in_features
model.fc = nn.Sequential(
nn.Linear(num_features, 512),
nn.ReLU(inplace=True),
nn.Linear(512, 64),
nn.Dropout(0.5, inplace=True),
nn.Linear(64, 2))
class_weights = torch.FloatTensor(weights).cuda()
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
running_loss = 0
losses = []
# To freeze the residual layers
for param in model.parameters():
param.requires_grad = False
for param in model.fc.parameters():
param.requires_grad = True
# Find total parameters and trainable parameters
total_params = sum(p.numel() for p in model.parameters())
print(f'{total_params:,} total parameters.')
total_trainable_params = sum(
p.numel() for p in model.parameters() if p.requires_grad)
print(f'{total_trainable_params:,} training parameters.')
24,590,082 total parameters.
1,082,050 training parameters.
# initialize the early_stopping object
early_stopping = pytorchtools.EarlyStopping(patience=patience, verbose=True)
for epoch in range(num_epochs):
##########################
#######TRAIN MODEL########
##########################
epochs_loss=0
##Switch to train mode
model.train()
for i, (images, labels) in enumerate(train_dl):
# Move tensors to the configured device
images = images.to(device)
labels = labels.to(device)
# Forward pass
# Backprpagation and optimization
optimizer.zero_grad()
outputs = model(images).to(device)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#calculate train_loss
train_losses.append(loss.item())
##########################
#####VALIDATE MODEL#######
##########################
model.eval()
for images, labels in val_dl:
images = images.to(device)
labels = labels.to(device)
outputs = model(images).to(device)
loss = criterion(outputs,labels)
valid_losses.append(loss.item())
# print training/validation statistics
# calculate average loss over an epoch
train_loss = np.average(train_losses)
valid_loss = np.average(valid_losses)
# print(train_loss)
avg_train_losses.append(train_loss)
avg_valid_losses.append(valid_loss)
print_msg = (f'train_loss: {train_loss:.5f} ' + f'valid_loss: {valid_loss:.5f}')
print(print_msg)
# clear lists to track next epoch
train_losses = []
valid_losses = []
early_stopping(valid_loss, model)
print(epoch)
if early_stopping.early_stop:
print("Early stopping")
break

How to run one batch in pytorch?

I'm new to AI and python and I'm trying to run only one batch to aim to overfit.I found the code:
iter(train_loader).next()
but I'm not sure where to implement it in my code. even if I did, how can I check after each iteration to make sure that I'm training the same batch?
train_loader = torch.utils.data.DataLoader(
dataset_train,
batch_size=48,
shuffle=True,
num_workers=2
)
net = nn.Sequential(
nn.Flatten(),
nn.Linear(128*128*3,10)
)
nepochs = 3
statsrec = np.zeros((3,nepochs))
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
for epoch in range(nepochs): # loop over the dataset multiple times
running_loss = 0.0
n = 0
for i, data in enumerate(train_loader, 0):
inputs, labels = data
# Zero the parameter gradients
optimizer.zero_grad()
# Forward, backward, and update parameters
outputs = net(inputs)
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
# accumulate loss
running_loss += loss.item()
n += 1
ltrn = running_loss/n
ltst, atst = stats(train_loader, net)
statsrec[:,epoch] = (ltrn, ltst, atst)
print(f"epoch: {epoch} training loss: {ltrn: .3f} test loss: {ltst: .3f} test accuracy: {atst: .1%}")
please give me a hint

If you are looking to train on a single batch, then remove your loop over your dataloader:
for i, data in enumerate(train_loader, 0):
inputs, labels = data
And simply get the first element of the train_loader iterator before looping over the epochs, otherwise next will be called at every iteration and you will run on a different batch every epoch:
inputs, labels = next(iter(train_loader))
i = 0
for epoch in range(nepochs):
optimizer.zero_grad()
outputs = net(inputs)
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
# ...

Pytorch MSE loss function nan during training

I am trying linear regression from boston dataset. MSE loss function is nan since the first iteration. I tried altering learning rate and batch_size but of no use.
from torch.utils.data import TensorDataset , DataLoader
inputs = torch.from_numpy(Features).to(torch.float32)
targets = torch.from_numpy(target).to(torch.float32)
train_ds = TensorDataset(inputs , targets)
train_dl = DataLoader(train_ds , batch_size = 5 , shuffle = True)
model = nn.Linear(13,1)
opt = optim.SGD(model.parameters(), lr=1e-5)
loss_fn = F.mse_loss
def fit(num_epochs, model, loss_fn, opt, train_dl):
# Repeat for given number of epochs
for epoch in range(num_epochs):
# Train with batches of data
for xb,yb in train_dl:
# 1. Generate predictions
pred = model(xb)
# 2. Calculate loss
loss = loss_fn(pred, yb)
# 3. Compute gradients
loss.backward()
# 4. Update parameters using gradients
opt.step()
# 5. Reset the gradients to zero
opt.zero_grad()
# Print the progress
if (epoch+1) % 10 == 0:
print('Epoch [{}/{}], Loss: {}'.format(epoch+1, num_epochs, loss.item()))
fit(100, model, loss_fn , opt , train_dl)
output

Pay attention to:
Use normalization: x = (x - x.mean()) / x.std()
y_train / y_test have to be (-1, 1) shapes. Use y_train.view(-1, 1) (if y_train is torch.Tensor or something)
(not your case, but for someone else) If you use torch.nn.MSELoss(reduction='sum') than you have to reduse the sum to mean. It can be done with torch.nn.MSELoss() or in train-loop: l = loss(y_pred, y) / y.shape[0].
Example:
...
loss = torch.nn.MSELoss()
...
for epoch in range(num_epochs):
for x, y in train_iter:
y_pred = model(x)
l = loss(y_pred, y)
optimizer.zero_grad()
l.backward()
optimizer.step()
print("epoch {} loss: {:.4f}".format(epoch + 1, l.item()))

Extract features from last hidden layer Pytorch Resnet18

I am implementing an image classifier using the Oxford Pet dataset with the pre-trained Resnet18 CNN.
The dataset consists of 37 categories with ~200 images in each of them.
Rather than using the final fc layer of the CNN as output to make predictions I want to use the CNN as a feature extractor to classify the pets.
For each image i'd like to grab features from the last hidden layer (which should be before the 1000-dimensional output layer). My model is using Relu activation so I should grab the output just after the ReLU (so all values will be non-negative)
Here is code (following the transfer learning tutorial on Pytorch):
loading data
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
image_datasets = {"train": datasets.ImageFolder('images_new/train', transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
])), "test": datasets.ImageFolder('images_new/test', transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize
]))
}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
shuffle=True, num_workers=4, pin_memory=True)
for x in ['train', 'test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'test']}
train_class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
train function
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
Compute SGD cross-entropy loss
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
print("number of features: ", num_ftrs)
model_ft.fc = nn.Linear(num_ftrs, len(train_class_names))
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=24)
Now how do I get a feature vector from the last hidden layer for each of my images? I know I have to freeze the previous layer so that gradient isn't computed on them but I'm having trouble extracting the feature vectors.
My ultimate goal is to use those feature vectors to train a linear classifier such as Ridge or something like that.
Thanks!

You can try the approach below. This will work for any layer with only a change of offset.
model_ft = models.resnet18(pretrained=True)
### strip the last layer
feature_extractor = torch.nn.Sequential(*list(model_ft.children())[:-1])
### check this works
x = torch.randn([1,3,224,224])
output = feature_extractor(x) # output now has the features corresponding to input x
print(output.shape)
torch.Size([1, 512, 1, 1])

This is probably not the best idea, but you can do something like this:
#assuming model_ft is trained now
model_ft.fc_backup = model_ft.fc
model_ft.fc = nn.Sequential() #empty sequential layer does nothing (pass-through)
# or model_ft.fc = nn.Identity()
# now you use your network as a feature extractor
I also checked fc is the right attribute to change, look at forward

If you know the name of your layer (eg layer4 in resnet), you can use hooks:
def get_hidden_features(x, layer):
activation = {}
def get_activation(name):
def hook(m, i, o):
activation[name] = o.detach()
return hook
model.register_forward_hook(get_activation(layer))
_ = model(x)
return activation[layer]
get_features(inputs, "layer4")
Example: https://discuss.pytorch.org/t/how-can-i-extract-intermediate-layer-output-from-loaded-cnn-model/77301/3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why the Validation Loss is too high? - python

Related

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[100, 1, 28, 28] to have 3 channels, but got 1 channels instead

Very high validation loss/small train loss in Pytorch, while finetuning resnet 50

How to run one batch in pytorch?

Pytorch MSE loss function nan during training

Extract features from last hidden layer Pytorch Resnet18

Categories

Resources