How to create a confusion matrices for an image segmentation task? - python

I'm dealing with binary image segmentation problem. I've successfully compiled and trained the model. Now I'm trying to achieve two goals:
Get a total confusion matrix for a test set (reason: understand proportions of false positives and false negatives)
Get an individual confusion matrix for every image in a test set (reason: find and analyze images that drag model performance down)
As far as I understand, confusion_matrix from scikit-learn package can help with a total confusion matrix, but I can't make it work with my custom data generator. According to docs, here's a code for confusion_matrix:
sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
I don't understand how to retrieve y_true with my custom data generator:
def learn_generator(templates_folder, masks_folder, image_width, batch_size, shuffle=True):
"""Generate individual batches form dataset"""
counter = 0
images_list = os.listdir(templates_folder)
if shuffle:
random.shuffle(images_list)
while True:
templates_pack = np.zeros((batch_size, image_width, image_width, 3)).astype('float')
masks_pack = np.zeros((batch_size, image_width, image_width, 1)).astype('float')
for i in range(counter, counter + batch_size):
template = cv2.imread(templates_folder + '/' + images_list[i]) / 255.
templates_pack[i - counter] = template
mask = cv2.imread(masks_folder + '/' + images_list[i], cv2.IMREAD_GRAYSCALE) / 255.
mask = np.expand_dims(mask, axis=2)
masks_pack[i - counter] = mask
counter += batch_size
if counter + batch_size >= len(images_list):
counter = 0
if shuffle:
random.shuffle(images_list)
yield templates_pack, masks_pack
test_templates_path = "E:/Project/images/all_templates/test"
test_masks_path = "E:/Project/images/all_masks/test"
TEST_SET_SIZE = len(os.listdir(test_templates_path))
IMAGE_WIDTH = 512
BATCH_SIZE = 4
TEST_STEPS = TEST_SET_SIZE / BATCH_SIZE
test_generator = learn_generator(test_templates_path, test_masks_path, IMAGE_WIDTH, batch_size=BATCH_SIZE)
Y_pred = model.predict_generator(test_generator, steps=TEST_STEPS)
y_pred = np.argmax(Y_pred, axis=1)
y_true = ???
As for individual confusion matrices, no ideas at all...
Any help is appreciated.

I suppose it's too late for you, but maybe this could help somebody else :
I did achive this by using the definition of a confusion matrix, by calculation True Positives, True Negatives, False Positives, False negatives.
This code works only for binary segmentation, assuming that "1" is the output for "positive" and "0" for "negative"...
import seaborn as sns
FP = len(np.where(Y_pred - Y_val == 1)[0])
FN = len(np.where(Y_pred - Y_val == -1)[0])
TP = len(np.where(Y_pred + Y_val ==2)[0])
TN = len(np.where(Y_pred + Y_val == 0)[0])
cmat = [[TP, FN], [FP, TN]]
plt.figure(figsize = (6,6))
sns.heatmap(cmat/np.sum(cmat), cmap="Reds", annot=True, fmt = '.2%', square=1, linewidth=2.)
plt.xlabel("predictions")
plt.ylabel("real values")
plt.show()
confusion_matrix

https://www.kite.com/blog/python/image-segmentation-tutorial/
I think you could change the code from this website for two classes and apply it for your use case.
Additionally, I think this answer does provide the implementation of sklearn confusion matrix method Faster method of computing confusion matrix?

Related

k nearst neighbour using numpy

k nearest neighbour is useful algorithm to classify labels, and I have some queries about it.
If I have a train set(1000, 400), a test set(300, 400) and train labels (1000), how can I apply k nearest neighbour to find the right test lables? Also, by using numpy. Thank you!
Assuming you have some basic understandment of numpy, this is some old code of mine for a KNN_classifier, anyway if you have any trouble using it or understanding it i can kindly explain it to you tommorow when i'll have some free time in case no one responds till then.
import numpy as np
from sklearn.metrics import accuracy_score as accuracy
class Knn_classifier:
def __init__(self, train_images, train_labels):
self.train_images = train_images
self.train_labels = train_labels
def classify_image(self, test_image, num_neighbors=3, metric='l2'):
if metric == 'l2':
distances = np.sqrt(np.sum(
np.square(self.train_images - test_image),
axis = 1
))
indexes = np.argsort(distances)
indexes = indexes[:num_neighbors]
labels = self.train_labels[indexes]
label = np.argmax(np.bincount(labels))
else:
distances = np.sum(np.abs(self.train_images - test_image),axis = 1)
indexes = np.argsort(distances)
indexes = indexes[:num_neighbors]
labels = self.train_labels[indexes]
label = np.argmax(np.bincount(labels))
return label
def classify_images(self, test_images, num_neighbors=3, metric='l2'):
# write your code here
labels = []
for image in test_images:
labels.append(self.classify_image(image,num_neighbors,metric))
return labels
def accuracy_score(self,predicted, ground_truth):
return accuracy(predicted, ground_truth)*100
train_images = np.load('data/train_images.npy') # load training images
train_labels = np.load('data/train_labels.npy') # load training labels
test_images = np.load('data/test_images.npy') # load testing images
test_labels = np.load('data/test_labels.npy') # load testing labels
knn = Knn_classifier(train_images, train_labels)
predicted = knn.classify_images(test_images, metric='l1')
knn.accuracy_score(predicted,test_labels)
The shapes of my train.images and train_labels we're (1000, 784) and
(1000,)
Here's a fully vectorized solution.
import numpy as np
(N_train, N_test, N_feats, N_labels, k) = (1000, 300, 400, 20, 5)
train_X = np.random.rand(N_train, N_feats)
train_y = np.random.randint(N_labels, size=N_train)
test_X = np.random.rand(N_test, N_feats)
# See: https://jaykmody.com/blog/distance-matrices-with-numpy/.
test_X2 = np.sum(test_X**2, axis=1, keepdims=True)
train_X2 = np.sum(train_X**2, axis=1)
test_train_X = test_X # train_X.T
sq_dists = test_X2 - 2 * test_train_X + train_X2
k_nearest_neighbors = np.argsort(sq_dists, axis=1)[:, :k]
k_labels = train_y[k_nearest_neighbors]
# See: https://stackoverflow.com/a/71812803/1316276.
k_labels_onehot = k_labels[..., None] == np.arange(N_labels)[None, None, :]
pred_y = np.argmax(np.count_nonzero(k_labels_onehot, axis=1), axis=-1)

Negative loss when trying to implement aleatoric uncertainty estimation according to Kendall et al

I'm trying to implement a neural network with aleatoric uncertainty estimation for regression with pytorch according to
Kendall et al.: "What Uncertainties Do We Need in Bayesian Deep
Learning for Computer Vision?" (Link).
However, while the predicted regression values fit the desired ground truth values quite well, the predicted variance looks weird and the loss gets negative during training.
The paper suggests to have two outputs mean and variance instead of only predicting the regression value. To be more precise, it is suggested to predict mean and log(variance) due to stability reasons. Therefore, my network looks as follows:
class ReferenceResNet(nn.Module):
def __init__(self):
super().__init__()
self.fcl1 = nn.Linear(1, 32)
self.fcl2 = nn.Linear(32, 64)
self.fcl3 = nn.Linear(64, 128)
self.fcl_mean = nn.Linear(128,1)
self.fcl_var = nn.Linear(128,1)
def forward(self, x):
x = torch.tanh(self.fcl1(x))
x = torch.tanh(self.fcl2(x))
x = torch.tanh(self.fcl3(x))
mean = self.fcl_mean(x)
log_var = self.fcl_var(x)
return mean, log_var
According to the paper, given these outputs, the corresponding loss function consists of a residual regression-part and a regularization term:
where si is the log(variance) predicted by the network.
I implemented this loss-function accordingly:
def loss_function(pred_mean, pred_log_var, y):
return 1/len(pred_mean)*(0.5 * torch.exp(-pred_log_var)*torch.sqrt(torch.pow(y-pred_mean, 2))+0.5*pred_log_var).sum()
I tried this code on a self-generated toy dataset (see image with results), however, the loss gets negative during training and when I plot the variance over the dataset after training, for me it does not really make sense while the corresponding mean values fit the ground truth quite well:
I already figured out that the negative loss comes from the regularization term as logarithms are negative for values between 0 and 1, however, I don't believe that the absolute value of the regularization term is supposed to grow bigger than the regression part. Does anyone know what is the reason for this and how I can prevent this from happening? And why does my variance look so weird?
For reproduction, my full code looks as follows:
import torch.nn as nn
import torch
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data.dataset import TensorDataset
from torchvision import datasets, transforms
import math
import numpy as np
import torch.nn.functional as F
import matplotlib.pyplot as plt
from tqdm import tqdm
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class ReferenceRegNet(nn.Module):
def __init__(self):
super().__init__()
self.fcl1 = nn.Linear(1, 32)
self.fcl2 = nn.Linear(32, 64)
self.fcl3 = nn.Linear(64, 128)
self.fcl_mean = nn.Linear(128,1)
self.fcl_var = nn.Linear(128,1)
def forward(self, x):
x = torch.tanh(self.fcl1(x))
x = torch.tanh(self.fcl2(x))
x = torch.tanh(self.fcl3(x))
mean = self.fcl_mean(x)
log_var = self.fcl_var(x)
return mean, log_var
def toy_function(x):
return math.sin(x/15-4)+2 + math.sin(x/10-5)
def loss_function(x_mean, x_log_var, y):
return 1/len(x_mean)*(0.5 * torch.exp(-x_log_var)*torch.sqrt(torch.pow(y-x_mean, 2))+0.5*x_log_var).sum()
BATCH_SIZE = 10
EVAL_BATCH_SIZE = 10
CLASSES = 1
TRAIN_EPOCHS = 50
# generate toy dataset: A train-set in form of a complex sin-curve
x_train_data = np.array([])
y_train_data = np.array([])
for repeat in range(2):
for i in range(50, 150):
for j in range(100):
sampled_x = i+np.random.randint(101)/100
sampled_y = toy_function(sampled_x)+np.random.normal(0,0.2)
x_train_data = np.append(x_train_data, sampled_x)
y_train_data = np.append(y_train_data, sampled_y)
x_eval_data = list(np.arange(50.0, 150.0, 0.1))
y_eval_data = [toy_function(x) for x in x_eval_data]
LOADER_KWARGS = {'num_workers': 0, 'pin_memory': False} if torch.cuda.is_available() else {}
train_set = TensorDataset(torch.Tensor(x_train_data),torch.Tensor(y_train_data))
eval_set = TensorDataset(torch.Tensor(x_eval_data), torch.Tensor(y_eval_data))
train_loader = torch.utils.data.DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True, **LOADER_KWARGS)
eval_loader = torch.utils.data.DataLoader(eval_set, batch_size=EVAL_BATCH_SIZE, shuffle=False, **LOADER_KWARGS)
TRAIN_SIZE = len(train_loader.dataset)
EVAL_SIZE = len(eval_loader.dataset)
assert (TRAIN_SIZE % BATCH_SIZE) == 0
assert (EVAL_SIZE % EVAL_BATCH_SIZE) == 0
net = ReferenceRegNet().to(DEVICE)
optimizer = optim.Adam(net.parameters(), lr=1e-3)
losses = {}
# train network
for epoch in range(1,TRAIN_EPOCHS+1):
net.train()
mean_epoch_loss = 0
mean_epoch_mse = 0
# train batches
for batch_idx, (data, target) in enumerate(tqdm(train_loader), start=1):
data, target = (data.to(DEVICE)).unsqueeze(dim=1), (target.to(DEVICE)).unsqueeze(dim=1)
optimizer.zero_grad()
output_means, output_log_var = net(data)
target_np = target.detach().cpu().numpy()
output_means_np = output_means.detach().cpu().numpy()
loss = loss_function(output_means, output_log_var, target)
loss_value = loss.item() # get raw float-value out of loss-tensor
mean_epoch_loss += loss_value
# optimize network
loss.backward()
optimizer.step()
mean_epoch_loss = mean_epoch_loss / len(train_loader)
losses.update({epoch:mean_epoch_loss})
print("Epoch " + str(epoch) + ": Train-Loss = " + str(mean_epoch_loss))
net.eval()
with torch.no_grad():
mean_loss = 0
mean_mse = 0
for data, target in eval_loader:
data, target = (data.to(DEVICE)).unsqueeze(dim=1), (target.to(DEVICE)).unsqueeze(dim=1)
output_means, output_log_var = net(data) # perform prediction
target_np = target.detach().cpu().numpy()
output_means_np = output_means.detach().cpu().numpy()
mean_loss += loss_function(output_means, output_log_var, target).item()
mean_loss = mean_loss/len(eval_loader)
#print("Epoch " + str(epoch) + ": Eval-loss = " + str(mean_loss))
fig = plt.figure(figsize=(40,12)) # create a 30x30 inch figure
ax = fig.add_subplot(1,3,1)
ax.set_title("regression value")
ax.set_xlabel("x")
ax.set_ylabel("regression mean")
ax.plot(x_train_data, y_train_data, 'x', color='black')
ax.plot(x_eval_data, y_eval_data, color='red')
pred_means_list = []
output_vars_list_train = []
output_vars_list_test = []
for x_test in sorted(x_train_data):
x_test = (torch.Tensor([x_test]).to(DEVICE))
pred_means, output_log_vars = net.forward(x_test)
pred_means_list.append(pred_means.detach().cpu())
output_vars_list_train.append(torch.exp(output_log_vars).detach().cpu())
ax.plot(sorted(x_train_data), pred_means_list, color='blue', label = 'training_perform')
pred_means_list = []
for x_test in x_eval_data:
x_test = (torch.Tensor([x_test]).to(DEVICE))
pred_means, output_log_vars = net.forward(x_test)
pred_means_list.append(pred_means.detach().cpu())
output_vars_list_test.append(torch.exp(output_log_vars).detach().cpu())
ax.plot(sorted(x_eval_data), pred_means_list, color='green', label = 'eval_perform')
plt.tight_layout()
plt.legend()
ax = fig.add_subplot(1,3,2)
ax.set_title("variance")
ax.set_xlabel("x")
ax.set_ylabel("regression var")
ax.plot(sorted(x_train_data), output_vars_list_train, label = 'training data')
ax.plot(x_eval_data, output_vars_list_test, label = 'test data')
plt.tight_layout()
plt.legend()
ax = fig.add_subplot(1,3,3)
ax.set_title("training loss")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
lists = sorted(losses.items())
epoch, loss = zip(*lists)
ax.plot(epoch, loss, label = 'loss')
plt.tight_layout()
plt.legend()
plt.savefig('ref_test.png')
TLDR: The optimization drives the loss to a minimum where the gradient
becomes zero, regardless of what the nominal loss value is.
A comprehensive explanation by K.Frank:
A smaller loss – algebraically less positive or algebraically more
negative – means (or should mean) better predictions. The
optimization step uses some version of gradient descent to make
your loss smaller. The overall level of the loss doesn’t matter as
far as the optimization goes. The gradient tells the optimizer how
to change the model parameters to reduce the loss, and it doesn’t
care about the overall level of the loss.
An example from the same source:
Consider, for example, optimizing with lossA = MSELoss. Now
imagine optimizing with lossB = lossA - 17.2. The 17.2 doesn’t
really change anything at all. It is true that “perfect” predictions
will yield lossB = -17.2 rather than zero. (lossA will, of course,
be zero for “perfect” predictions.) But who cares?
In your example: you are right, the negative loss value comes from the logarithmic term. This is completely OK and it means that your training is dominated by contributions of high-confidence loss terms. Regarding the high values of variance - can't comment much on it but it should be fine since the loss curve drops as expected.

Numpy NN giving weird results on synthetic dataset

I am following the book Grokking Deep Learning (Ch 8, code here) to build a Numpy Neural network which can classify MNIST digits with ~82% test accuracy. But when I modify the NN to work on a synthetic dataset, it goes to a specific train accuracy (depending on dimension of hidden layer, alpha) and stays there right from the start of training. Please check:
import numpy as np
import sys
from sklearn import datasets
X, y = datasets.make_classification(n_samples=10000, n_features=5, n_classes=4,
n_clusters_per_class=1, shuffle=True, random_state=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
def relu(x):
return (x >= 0) * x # returns x if x > 0
# returns 0 otherwise
def relu2deriv(output):
return output >= 0 #returns 1 for input > 0
def onehot(arr):
one_hot_labels = np.zeros((len(arr),4))
for i,l in enumerate(arr):
one_hot_labels[i][l] = 1
return one_hot_labels
y_train = onehot(y_train)
y_test = onehot(y_test)
alpha, iterations, hidden_size = (0.002, 300, 10)
weights_0_1 = 0.2*np.random.random((5, hidden_size)) - 0.1
weights_1_2 = 0.2*np.random.random((hidden_size, 4)) - 0.1
for j in range(iterations):
error, correct_cnt = (0.0,0)
for i in range(len(X_train)):
layer_0 = X_train[i:i+1]
layer_1 = relu(np.dot(layer_0,weights_0_1))
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask * 2
layer_2 = np.dot(layer_1,weights_1_2)
error += np.sum((y_train[i:i+1] - layer_2) ** 2)
correct_cnt += int(np.argmax(layer_2) == np.argmax(y_train[i:i+1]))
layer_2_delta = (y_train[i:i+1] - layer_2)
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
if(j%1 == 0): # can be set for any interval
test_error = 0.0
test_correct_cnt = 0
for i in range(len(X_test)):
layer_0 = X_test[i:i+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
test_error += np.sum((y_test[i:i+1] - layer_2) ** 2)
test_correct_cnt += int(np.argmax(layer_2) == np.argmax(y_test[i:i+1]))
sys.stdout.write("\n" + \
"I:" + str(j) + \
" Test-Err:" + str(test_error/ float(len(X_test)))[0:5] +\
" Test-Acc:" + str(test_correct_cnt/ float(len(X_test)))+\
" Train-Err:" + str(error/ float(len(X_train)))[0:5] +\
" Train-Acc:" + str(correct_cnt/ float(len(X_train))))
Output:
I:0 Test-Err:0.470 Test-Acc:0.812 Train-Err:0.704 Train-Acc:0.572
I:1 Test-Err:0.452 Test-Acc:0.811 Train-Err:0.574 Train-Acc:0.626625
I:2 Test-Err:0.445 Test-Acc:0.814 Train-Err:0.571 Train-Acc:0.61425
.
.
.
I:297 Test-Err:0.470 Test-Acc:0.7685 Train-Err:0.613 Train-Acc:0.6045
I:298 Test-Err:0.492 Test-Acc:0.785 Train-Err:0.612 Train-Acc:0.60525
I:299 Test-Err:0.478 Test-Acc:0.778 Train-Err:0.614 Train-Acc:0.60725
What's going on? How can this NN perform on the MNIST dataset but not on this dataset?
I believe the problem lies in the fact that you have no bias terms.
For example,
layer_1 = relu(np.dot(layer_0,weights_0_1))
Geometrically, that means the output of layer 1 (and the rest of the layers) has no translation term, which makes it so the decision boundary is forced to pass through the origin.
See visualization
Thus it may be impossible for a decision boundary to be learned for data that is not around 0.
Think of data that is closely clustered around (0, 1) and (0, 2) for binary classification. No linear boundary that passes through (0, 0) can separate those clusters.
See here for a nice explanation on why bias is required.
I believe (and did not check), that adding bias terms should allow for convergence.
layer_1 = relu(np.dot(layer_0,weights_0_1) + layer_0_bias)
and so on.
The bias' derivative is discussed here.
There are more possible reasons.
layer_2's output is the output, on which MSELoss is calculated, instead of using NLL or CrossEntropy Loss.
No normalization of inputs occurs, which may cause the net to not learn. This is not likely for the synthetic data which comes from a hypercube, but is likely for other general data.

I get a tensor of 600 values instead of 3 values for mean and std of train_loader in PyTorch

I am trying to Normalize my images data and for that I need to find the mean and std for train_loader.
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in train_loader:
images, landmarks = data["image"], data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples, images.size(1), -1)
mean += torch.Tensor.float(images_data).mean(2).sum(0)
std += torch.Tensor.float(images_data).std(2).sum(0)
###mean += images_data.mean(2).sum(0)
###std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
the mean and std here are each a torch.Size([600])
When I tried (almost) same code on dataloader, it worked as expected:
# code from https://discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6?u=mona_jalal
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in dataloader:
images, landmarks = data["image"], data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples, images.size(1), -1)
mean += images_data.mean(2).sum(0)
std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
and I got:
mean is: tensor([0.4192, 0.4195, 0.4195], dtype=torch.float64), std is: tensor([0.1182, 0.1184, 0.1186], dtype=torch.float64)
So my dataloader is:
class MothLandmarksDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = os.path.join(self.root_dir, self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx, 1:]
landmarks = np.array([landmarks])
landmarks = landmarks.astype('float').reshape(-1, 2)
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
root_dir='.',
transform=transforms.Compose(
[
Rescale(256),
RandomCrop(224),
ToTensor()
]
)
)
dataloader = DataLoader(transformed_dataset, batch_size=3,
shuffle=True, num_workers=4)
and train_loader is:
# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Test set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True, num_workers=4)
test_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)
Please let me know if more information is needed.
I basically need to get 3 values for mean of train_loader and 3 values for std of train_loader to use as args for Normalize.
images_data in dataloader is torch.Size([3, 3, 50176]) inside the loop and images_data in train_loader is torch.Size([8, 600, 2400])
First, the weird shape you get for your mean and std ([600]) is unsuprising, it is due to your data having the shape [8, 600, 800, 3]. Basically, the channel dimension is the last one here, so when you try to flatten your images with
# (N, 600, 800, 3) -> [view] -> (N, 600, 2400 = 800*3)
images_data = images.view(batch_samples, images.size(1), -1)
You actually perform a weird operation that fuses together the width and channel dimensions of your image which is now [8, 600, 2400]. Thus, applying
# (8, 600, 2400) -> [mean(2)] -> (8, 600) -> [sum(0)] -> (600)
data.mean(2).sum(0)
Creates a tensor of size [600] which is what you indeed get.
There are two quite simple solutions :
Either you start by permuting the dimensions to make the 2nd dimension the channel one :
batch_samples = images.size(0)
# (N, H, W, C) -> (N, C, H, W)
reordered = images.permute(0, 3, 1, 2)
# flatten image into (N, C, H*W)
images_data = reordered.view(batch_samples, reordered.size(1), -1)
# mean is now (C) = (3)
mean += images_data.mean(2).sum(0)
Or you changes the axis along which to apply mean and sum
batch_samples = images.size(0)
# flatten image into (N, H*W, C), careful this is not what you did
images_data = images.view(batch_samples, -1, images.size(1))
# mean is now (C) = (3)
mean += images_data.mean(1).sum(0)
Finally, why did dataloaderand trainloader behave differently ? Well I think it's because one is using dataset while the other is using transformedDataset. In TransformedDataset, you apply the toTensortransform which cast a PIL image into a torch tensor, and I think that pytorch is smart enough to permute your dimensions during this operation (and put the channels in the second dimension). In other word, your two datasets just do not yield images with identical format, they differ by a permutation of the axis.

Compare predictet image class to actual image class with keras

I am training a keras model to recognise images of cats, dogs and horses.
So far, I have one-hot-encoded my data (since this is a multi-class classification problem), trained my model and called the predictions.
def read_and_process_images(list_of_images):
X = [] #images
y = [] #labels
for image in list_of_images:
try:
X.append(cv2.resize(cv2.imread(image, cv2.IMREAD_COLOR),(nrows, ncolumns), interpolation = cv2.INTER_CUBIC))
if 'dog' in image:
y.append(0)
elif 'cat' in image:
y.append(1)
elif 'horse' in image:
y.append(2)
except Exception as e:
print(str(e))
return X, y
...
X_test, y_test = read_and_process_images(test_imgs)
x = np.array(X_test)
test_datagen = ImageDataGenerator(rescale = 1./255)
i = 0
text_labels = []
plt.figure(figsize = (30,20))
for batch in test_datagen.flow(x, batch_size = 1):
pred = model.predict(batch)
print(np.argmax(pred))
if np.argmax(pred) == 0 :
text_labels.append('dog')
elif np.argmax(pred) == 1:
text_labels.append('cat')
else:
text_labels.append('horse')
plt.subplot(5 / columns + 1, columns, i+1)
plt.title('I think this is a ' + text_labels[i])
imgplot = plt.imshow(batch[0])
i += 1
if i % 10 == 0:
break
plt.show()
The model seems to be working very well. I usually get between 7-10 correct predictions, depending on the batch size. However, I do not understand how model.predict chooses the batches, and I am therefore unable to compare the actual value to the predicted value. When I try to do the following:
y_pred = model.predict(x, batch_size=1)
matrix = confusion_matrix(y_test, y_pred.argmax(axis=1))
the confusion matrix that I get is completely nonsensical (for example it tells me that it only got one cat correct, but I can clearly see with some batches that it got many more correct). Could someone explain to me, how the .predict function goes about choosing its batches, and how I can successfully compare the predicted values to the actual test values? Thank you in advance.

Categories

Resources