I'm very new to pytorch and I'm very stuck with model converging. It seems to me it is not learning since the loss/r2 do not improve.
What I've checked/tried based on the suggestions I found here.
changes/wrote from scratch loss function
set "loss.requires_grad = True"
tried to feed the data without dataloader / just straight manual batches
played with 2d data / mean pooled data!!! I got decent results for mean pooled data in Random Forest and SVM regressor, but not in NN, which confuses me and makes me think that the data is OK and net is NOT ok!
played with learning rate, batch size
etc.
About the data: bert embeddings from letter-sequence, each data point is 1024 features 43 rows (for Conv1d I transpose it to 1024*43)
Total >40K data point in train, batch size=64
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F ## relu, tahn
import torch.utils.data as DataLoader # helps create batches to train on
from scipy.stats import pearsonr
import numpy as np
import torch.utils.data as data_utils
torch.set_printoptions(precision=10)
#Hyperparameters
learning_rate=0.001
batch_size = 64
num_epochs=100
data_train1 = torch.Tensor(data_train)
targets_train1=torch.Tensor(targets_train)
dataset_train = data_utils.TensorDataset(data_train1, targets_train1)
train_loader = DataLoader.DataLoader(dataset=dataset_train, batch_size=batch_size, shuffle=True)
class NN (nn.Module):
def __init__(self):#input_size=43x1024
super(NN,self).__init__()
self.layers = nn.Sequential(
nn.Conv1d(1024, 512, kernel_size=4), #I tried different in and out here
nn.ELU(),
nn.BatchNorm1d(512),
nn.Flatten(),
nn.Linear(512*40, 512),
nn.ReLU(),
nn.Linear(512, 1)
)
def forward(self, x):
return self.layers(x)
torch.manual_seed(100)
#Initialize network
model=NN().to(device)
#Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate) #to check
#Training the model
metric_name='r2'
for epoch in range(num_epochs):
score=[]
loss_all=[]
print(f"Epoch: {epoch+1}/{num_epochs}")
model.train()
for batch_idx, (data, targets) in enumerate(train_loader):
data=data.to(device=device)
targets=targets.to(device=device)
optimizer.zero_grad()
#forward
predictions=model(data)
loss=criterion(predictions,targets).to(device)
loss.requires_grad = True
#backward
loss_all.append(loss.item())
loss.backward()
#gradient descent or adam step
optimizer.step()
#computing r squared
output=predictions.detach().cpu().numpy()
target=targets.detach().cpu().numpy()
output=np.squeeze(output)
target=np.squeeze(target)
score.append(pearsonr(target, output)[0]**2)
total_score = sum(score)/len(score)
print(f'training {metric_name}: {total_score}, mean loss: {sum(loss_all)/len(loss_all)}')
Output for (10) first Epochs:
Epoch: 1/100 training r2: 0.0026224905802415955, mean loss: 0.5084380856556941
Epoch: 2/100 training r2: 0.0026334153423518466, mean loss: 0.5082988155293148
Epoch: 3/100 training r2: 0.002577073836564485, mean loss: 0.5085703951569392
Epoch: 4/100 training r2: 0.002633483899689855, mean loss: 0.5081870414129565
Epoch: 5/100 training r2: 0.0025642136678393776, mean loss: 0.5083346445680192
Epoch: 6/100 training r2: 0.0026261540869286933, mean loss: 0.5084220717277274
Epoch: 7/100 training r2: 0.002614604670602339, mean loss: 0.5082813335398275
Epoch: 8/100 training r2: 0.0024826257263258784, mean loss: 0.5086268588042153
Epoch: 9/100 training r2: 0.00261018096876641, mean loss: 0.5082496945227619
Epoch: 10/100 training r2: 0.002542892071836945, mean loss: 0.5088265852086478
Response is float64 in range (-2,2).
Hope you can help! Thank you in advance for your time!
Update 1. with response scaled to float64 in range [-1,1] and tanh still not converging. I feel like something general is missing. BTW, when I do non shuffled batches and straight-forward data for batches (first batch with indices 0-63, second bath 64-127 etc) I get the same score results in each of epochs!
Update 2. tried to add (2) more sequences with Conv1d, BatchNorm1d, ELU (1024->512, 512->256, 256->128, kernel size 4) and the result is worse:( it is not learning at all!
Related
I'm training an Xception model on keras without using pre-trained weights for a Binary Classification problem, and I get a very strange behavior. The history plot shows that the training accuracy is increasing until it touch 100%, while the validation one is always around 50%, so it seems it's overfitting but that's not the case, since I checked and it always predict (close to) 1 even on the training set.
What could be the reason of this behaviour?
Here is the code I'm using to train. x_train_xception has already been preprocessed by the keras.applications.xception.preprocess_input function.
I'm using the same code, except the model creation, for training a pretrained Xception model, which works fine
inLayerX = Input((512, 512, 4))
xceptionModel = keras.applications.Xception(include_top = True, weights=None, input_tensor=inLayerX, classes=1, classifier_activation= 'sigmoid')
xceptionModel.compile(loss= 'binary_crossentropy', metrics = ['accuracy'])
history = xceptionModel.fit(x_train_xception, y_train, batch_size= batch_size, epochs= epochs, validation_data=(x_val_xception, y_val))
_, accTest = xceptionModel.evaluate(x_test_xception, y_test)
_, accVal = xceptionModel.evaluate(x_val_xception, y_val)
_, accTrain = xceptionModel.evaluate(x_train_xception, y_train)
print("Training accuracy {:.2%}".format(accTrain))
print("Validation accuracy {:.2%}".format(accVal))
print("Test accuracy {:.2%}".format(accTest))
which output:
2/2 [==============================] - 6s 2s/step - loss: 1.2063 - accuracy: 0.5000
1/1 [==============================] - 4s 4s/step - loss: 1.2960 - accuracy: 0.4667
4/4 [==============================] - 5s 1s/step - loss: 1.2025 - accuracy: 0.5083
Training accuracy 50.83%
Validation accuracy 46.67%
Test accuracy 50.00%
the validation and test accuracy is expected but what's bugging me is really the training accuracy, which I would expect to be close to 100% looking at the history.
Model Accuracy
Model Loss
I'm trying to make a model predict the race of a 75x75 image's ethnicity, but when ever I train the model, the accuracy always stays completely still at 53.2%. I didn't realize why until I actually ran it on some of photos. It turned out, that no matter what the photo was, it would always predict 'other'. I'm not entirely sure why though.
I copied the code over from the official PyTorch Quickstart tutorial, and in that dataset or the standard MNIST data, it worked fine. I changed the dataset to the UTKFace, and then it started only predicting one label, all the time.
Here's my code:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
import torch.nn.functional as F
training_data = ImageFolder(
root = "data_training/",
transform = ToTensor(),
)
testing_data = ImageFolder(
root = "data_testing/",
transform = ToTensor()
)
training_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(testing_data, batch_size=64, shuffle=True)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(1296, 1024)
self.fc2 = nn.Linear(1024, 1024)
self.fc3 = nn.Linear(1024, 512)
self.fc4 = nn.Linear(512, 84)
self.fc5 = nn.Linear(84, 5)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = self.fc5(x)
return x
model = NeuralNetwork().to("cpu")
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y = X.to("cpu"), y.to("cpu")
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def tests(dataloader, model):
size = len(dataloader.dataset)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to("cpu"), y.to("cpu")
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= size
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(training_dataloader, model, loss_fn, optimizer)
tests(test_dataloader, model)
torch.save(model.state_dict(), "model.pth")
The training logs:
Epoch 1
-------------------------------
loss: 1.628994 [ 0/23705]
loss: 1.620698 [ 6400/23705]
loss: 1.615423 [12800/23705]
loss: 1.596390 [19200/23705]
Test Error:
Accuracy: 53.2%, Avg loss: 0.024725
Epoch 2
-------------------------------
loss: 1.593613 [ 0/23705]
loss: 1.581375 [ 6400/23705]
loss: 1.583656 [12800/23705]
loss: 1.591942 [19200/23705]
Test Error:
Accuracy: 53.2%, Avg loss: 0.024165
Epoch 3
-------------------------------
loss: 1.541260 [ 0/23705]
loss: 1.592345 [ 6400/23705]
loss: 1.540908 [12800/23705]
loss: 1.540741 [19200/23705]
Test Error:
Accuracy: 53.2%, Avg loss: 0.023705
Epoch 4
-------------------------------
loss: 1.566888 [ 0/23705]
loss: 1.524875 [ 6400/23705]
loss: 1.540764 [12800/23705]
loss: 1.510044 [19200/23705]
Test Error:
Accuracy: 53.2%, Avg loss: 0.023323
Epoch 5
-------------------------------
loss: 1.530084 [ 0/23705]
loss: 1.498773 [ 6400/23705]
loss: 1.537755 [12800/23705]
loss: 1.508989 [19200/23705]
Test Error:
Accuracy: 53.2%, Avg loss: 0.022993
....
No matter how many epochs I set it to, or how many layers I add in to try to get it to overfit, it always just seems to guess the same thing over and over again, with no signs of improvement.
I separated the UTKFace dataset into folders based on the ethnicity category of the name. There are 23705 images in the training data and 10134 in the testing.
I'm not sure why this is happening. Is my dataset not large enough? Are there not enough layers?
The number of layers and the dataset size don't explain this behavior for this example. Your CNN is behaving as a constant function, so far I don't know why, but these might be some clues:
Since you have separated your data by label into folders, if you are training your model using only one of those folders you will obtain a constant function.
The last layer of your neural network has no activation function! This is, in method forward you are doing x = self.fc5(x) instead of x = F.<function>(self.fc5(x)).
Where do you indicate, when loading the training data, which is the label for each image? Are you sure that training_dataloader is loading the images with their correct label?
few comments:
Did you check the ground truth in the test data (the shape may be different)
can you check the output probabilities to see if the predictions are unanimous ? (Btw you don't necessarily need a activation function at the end in this case as the pytorch crossentropy already contains a logsoftmax)
Did you try conv2d with more filters (like 16,32 or 64)
The % of error seems fine as in the link you put, the accuracy is around 35%
It does seems a bit weird not to be able to over-fit.
I have a dataset that I used for making NN model in Keras, i took 2000 rows from that dataset to have them as validation data, those 2000 rows should be added in .predict function.
I wrote a code for Keras NN and for now it works good, but I noticed something that is very strange for me. It gives me very good accuracy of more than 83%, loss is around 0.12, but when I want to make a prediction with unseen data (those 2000 rows), it only predicts correct in average of 65%.
When I add Dropout layer, it only decreases accuracy.
Then I have added EarlyStopping, and it gave me accuracy around 86%, loss is around 0.10, but still when I make prediction with unseen data, I get final prediction accuracy of 67%.
Does this mean that model made correct prediction in 87% of situations? Im going with a logic, if I add 100 samples in my .predict function, that program should make good prediction for 87/100 samples, or somewhere in that range (lets say more than 80)? I have tried to add 100, 500, 1000, 1500 and 2000 samples in my .predict function, and it always make correct prediction in 65-68% of the samples.
Why is that, am I doing something wrong?
I have tried to play with number of layers, number of nodes, with different activation functions and with different optimizers but it only changes the results by 1-2%.
My dataset looks like this:
DataFrame shape (59249, 33)
x_train shape (47399, 32)
y_train shape (47399,)
x_test shape (11850, 32)
y_test shape (11850,)
testing_features shape (1000, 32)
This is my NN model:
model = Sequential()
model.add(Dense(64, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
# compile the model, adam gradient descent (optimized)
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
# call the function to fit to the data training the network)
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=1, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
These are the results:
Train on 47399 samples, validate on 11850 samples
Epoch 1/15
- 25s - loss: 0.3648 - acc: 0.8451 - val_loss: 0.2825 - val_acc: 0.8756
Epoch 2/15
- 9s - loss: 0.2949 - acc: 0.8689 - val_loss: 0.2566 - val_acc: 0.8797
Epoch 3/15
- 9s - loss: 0.2741 - acc: 0.8773 - val_loss: 0.2468 - val_acc: 0.8849
Epoch 4/15
- 9s - loss: 0.2626 - acc: 0.8816 - val_loss: 0.2416 - val_acc: 0.8845
Epoch 5/15
- 10s - loss: 0.2566 - acc: 0.8827 - val_loss: 0.2401 - val_acc: 0.8867
Epoch 6/15
- 8s - loss: 0.2503 - acc: 0.8858 - val_loss: 0.2364 - val_acc: 0.8893
Epoch 7/15
- 9s - loss: 0.2480 - acc: 0.8873 - val_loss: 0.2321 - val_acc: 0.8895
Epoch 8/15
- 9s - loss: 0.2450 - acc: 0.8886 - val_loss: 0.2357 - val_acc: 0.8888
11850/11850 [==============================] - 2s 173us/step
loss 23.57 acc 88.88
And this is for prediction:
#testing_features are 2000 rows that i extracted from dataset (these samples are not used in training, this is separate dataset thats imported)
prediction = model.predict(testing_features , batch_size=32)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - also much lower
acc_score = accuracy_score(testing_results, res)
print("Sklearn acc", acc_score)
result_df = pd.DataFrame({"label":testing_results,
"prediction":res})
result_df["prediction"] = result_df["prediction"].astype(int)
s = 0
for x,y in zip(result_df["label"], result_df["prediction"]):
if x == y:
s+=1
print(s,"/",len(result_df))
acc = s*100/len(result_df)
print('TOTAL ACC:', round(acc,2))
The problem is...now I get accuracy with sklearn 52% and my_acc 52%.
Why do I get such low accuracy on validation, when it says that its much larger?
The training data you posted gives high validation accuracy, so I'm a bit confused as to where you get that 65% from, but in general when your model performs much better on training data than on unseen data, that means you're over fitting. This is a big and recurring problem in machine learning, and there is no method guaranteed to prevent this, but there are a couple of things you can try:
regularizing the weights of your network, e.g. using l2 regularization
using stochastic regularization techniques such as drop-out during training
early stopping
reducing model complexity (but you say you've already tried this)
I will list the problems/recommendations that I see on your model.
What are you trying to predict? You are using sigmoid activation function in the last layer which seems it is a binary classification but in your loss fuction you used mse which seems strange. You can try binary_crossentropy instead of mse loss function for your model.
Your model seems suffer from overfitting so you can increase the prob. of Dropout and also add new Dropout between other hidden layers or you can remove one of the hidden layers because it seem your model is too complex.
You can change your neuron numbers in layers like a narrower => 64 -> 32 -> 16 -> 1 or try different NN architectures.
Try adam optimizer instead of sgd.
If you have 57849 sample you can use 47000 samples in training+validation and rest of will be your test set.
Don't use the same sets for your evaluation and validation. First split your data into train and test set. Then when you are fitting your model give validation_split_ratio then it will automatically give validation set from your training set.
I am trying to evaluate my model on the whole training set after each epoch.
This is what I did:
torch.manual_seed(1)
model = ConvNet(num_classes=num_classes)
cost_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def compute_accuracy(model, data_loader):
correct_pred, num_examples = 0, 0
for features, targets in data_loader:
logits = model(features)
predicted_labels = torch.argmax(logits, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
the output was convincing:
Epoch: 001/005 training accuracy: 89.08%
Epoch: 002/005 training accuracy: 90.41%
Epoch: 003/005 training accuracy: 91.70%
Epoch: 004/005 training accuracy: 92.31%
Epoch: 005/005 training accuracy: 92.95%
But then I added another line at the end of the training loop, to also evaluate the model on the whole test set after each epoch:
for epoch in range(num_epochs):
model = model.train()
for features, targets in train_loader:
logits = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
optimizer.step()
model = model.eval()
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
print('\t\t testing accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
But the training accuracies started to change:
Epoch: 001/005 training accuracy: 89.08%
testing accuracy: 87.66%
Epoch: 002/005 training accuracy: 90.42%
testing accuracy: 89.04%
Epoch: 003/005 training accuracy: 91.84%
testing accuracy: 90.01%
Epoch: 004/005 training accuracy: 91.86%
testing accuracy: 89.83%
Epoch: 005/005 training accuracy: 92.45%
testing accuracy: 90.32%
Am I doing something wrong? I expected the training accuracies to remain the same because the manual seed is 1 in both cases.
Is this an expected output ?
The random seed had been set wasn't stop the model for learning to get higher accuracy becuase the random seed is a number for Pseudo random. In this case, you had told the model to shuffle the training data with a random number("1").
For the purposes of this MWE I'm trying to fit a linear regression using a custom loss function with multiple terms. However, I'm running into strange behavior when trying to weight the different terms in my loss function by dotting a weight vector with my losses. Just summing the losses works as expected; however, when dotting the weights and losses the backpropagation gets broken somehow and the loss function doesn't decrease.
I've tried enabling and disabling requires_grad on both tensors, but have been unable to replicate the expected behavior.
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# Hyper-parameters
input_size = 1
output_size = 1
num_epochs = 60
learning_rate = 0.001
# Toy dataset
x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
[9.779], [6.182], [7.59], [2.167], [7.042],
[10.791], [5.313], [7.997], [3.1]], dtype=np.float32)
y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
[3.366], [2.596], [2.53], [1.221], [2.827],
[3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
# Linear regression model
model = nn.Linear(input_size, output_size)
# Loss and optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
def loss_fn(outputs, targets):
l1loss = torch.norm(outputs - targets, 1)
l2loss = torch.norm(outputs - targets, 2)
# This works as expected
# loss = 1 * l1loss + 1 * l2loss
# Loss never changes, no matter what combination of
# requires_grad I set
loss = torch.dot(torch.tensor([1.0, 1.0], requires_grad=False),
torch.tensor([l1loss, l2loss], requires_grad=True))
return loss
# Train the model
for epoch in range(num_epochs):
# Convert numpy arrays to torch tensors
inputs = torch.from_numpy(x_train)
targets = torch.from_numpy(y_train)
# Forward pass
outputs = model(inputs)
loss = loss_fn(outputs, targets)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 5 == 0:
print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# Plot the graph
predicted = model(torch.from_numpy(x_train)).detach().numpy()
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.plot(x_train, predicted, label='Fitted line')
plt.legend()
plt.show()
Expected result: loss function decreases and the linear regression is fitted (see output below)
Epoch [5/60], Loss: 7.9943
Epoch [10/60], Loss: 7.7597
Epoch [15/60], Loss: 7.6619
Epoch [20/60], Loss: 7.6102
Epoch [25/60], Loss: 7.4971
Epoch [30/60], Loss: 7.4106
Epoch [35/60], Loss: 7.3942
Epoch [40/60], Loss: 7.2438
Epoch [45/60], Loss: 7.2322
Epoch [50/60], Loss: 7.1012
Epoch [55/60], Loss: 7.0701
Epoch [60/60], Loss: 6.9612
Actual result: no change in loss function
Epoch [5/60], Loss: 73.7473
Epoch [10/60], Loss: 73.7473
Epoch [15/60], Loss: 73.7473
Epoch [20/60], Loss: 73.7473
Epoch [25/60], Loss: 73.7473
Epoch [30/60], Loss: 73.7473
Epoch [35/60], Loss: 73.7473
Epoch [40/60], Loss: 73.7473
Epoch [45/60], Loss: 73.7473
Epoch [50/60], Loss: 73.7473
Epoch [55/60], Loss: 73.7473
Epoch [60/60], Loss: 73.7473
I'm pretty confused as to why such a simple operation is breaking the backpropagation gradients and would really appreciate it if anyone had some insights on why this isn't working.
Use torch.cat((loss1, loss2)), you are creating new Tensor from existing tensors destroying graph.
Anyway you shouldn't do that unless you are trying to generalize your loss function, it's pretty unreadable. Simple addition is way better.