Proper dataloader setup to train fasterrcnn-resnet50 for object detection with pytorch - python

I am trying to train pytorches torchvision.models.detection.fasterrcnn_resnet50_fpn to detect objects in my own images.
According to the documentation, this model expects a list of images and a list of dictionaries with
'boxes' and 'labels' as keys. So my dataloaders __getitem__() looks like this:
def __getitem__(self, idx):
# load images
_, img = self.images[idx].getImage()
img = Image.fromarray(img, mode='RGB')
objects = self.images[idx].objects
boxes = []
labels = []
for o in objects:
# append bbox to boxes
boxes.append([o.x, o.y, o.x+o.width, o.y+o.height])
# append the 4th char of class_id, the number of lights (1-4)
labels.append(int(str(o.class_id)[3]))
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
labels = torch.as_tensor(labels, dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
# transforms consists only of transforms.Compose([transforms.ToTensor()]) for the time being
if self.transforms is not None:
img = self.transforms(img)
return img, target
To my best knowledge, it returns exactly what's asked. My dataloader looks like this
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=4, shuffle=False, num_workers=2)
however, when it get's to this stage:
for images, targets in dataloaders[phase]:
it raises
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 12 and 7 in dimension 1 at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensor.cpp:689
Can someone point me in the right direction?

#jodag was right, I had to write a seperate collate function in order for the net to receive the data like it was supposed to. In my case I only needed to bypass the default function.

Related

Getting Error: TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple

I am working on a CNN multi-class classification of different concentrations (10uM, 30uM, etc.) I create my dataset to include the images as the features and the concentrations as labels. Note that the concentrations are left as a string. When running the code, I am getting the following error:
TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple
The following is my dataset class:
class CustomDataset(Dataset):
def __init__(self, path, method):
"""
Args:
csv_path (string): path to csv file
data_path (string): path to the folder where images are
transform: pytorch transforms for transforms and tensor conversion
"""
# Transforms
self.to_tensor = transforms.ToTensor()
# Read the excel file
self.data_path = pd.read_excel(path, sheet_name=method)
# First column contains the image paths
self.img_arr = np.asarray(self.data_path.iloc[:, 0])
# Second column is the labels
self.label_arr = np.asarray(self.data_path.iloc[:, 1])
def __getitem__(self, index):
# Get image name from the pandas df
img_path = self.img_arr[index]
# Open image
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Converts the image from BGR to RGB
# Transform image to tensor
img_tensor = self.to_tensor(img)
# Get label(class) of the image based on the cropped pandas column
img_label = self.to_tensor(self.label_arr[index])
img_label = self.label_arr[index]
return (img_tensor, img_label)
def __len__(self):
return len(self.data_path)
I am aware that the reason is most probably due to the fact that the labels are left as tuples, so the loss function is unable to compare the CNN output with the label. However, I am unable to find any resources that explain how labels are dealt with in multi-class classifications of tuple type labels. The solution seems simple, but I am a bit confused on how to solve it. Can anyone direct me?
EDIT: This is the implemented training loop:
def train_epoch(model,dataloader,loss_fn,optimizer):
train_loss,train_correct = 0.0, 0
model.train() #Sets the mode to train (Helpful when using layers such as DropOut and BatchNorm)
for features,labels in dataloader:
#Zero grad
optimizer.zero_grad()
#Forward Pass
output=model(features)
print(output)
print(labels)
loss=loss_fn(output,labels)
#Backward Pass
loss.backward()
optimizer.step()
train_loss += loss.item()*features.size(0) #features.size is useful when using batches.
scores, predictions = torch.max(output.data,1) # 1 is to create a 1 dimensional tensor with max values from each row
train_correct += (predictions==labels).sum().item()
return train_loss, train_correct
This is the output of "output" and "labels", respectively:
tensor([[-0.0528, -0.0150, -0.0153, -0.0939, -0.0887, -0.0863]],
grad_fn=<AddmmBackward0>)
('70uM',)

Problem reading and augmenting images in tf.data API using CSV / pandas DataFrames

I'm trying to (pre)process and augment my data and target variables when reading in the data each epoch/batch using the tf.data API. My unprocessed data is a CSV/pandas DataFrame with the format
index, img_id, c1, ..., c5 where img_id contains the path to an image while c1,...,c5 are run length encodings of different defects in the image, both are strings. To increase the amount of data I want to augment (e.g. flip) the images (and therefore the masks of defects aswell) with a certain probability for each image when reading it each batch/epoch. I want to read each image from my drive to save memory and because this seems to still yield good performance within the API (due to prefetching etc).
I'm familiar doing this using pytorchs DataLoader API (using version 1.8.1+cu111), but as this is for a course where I have to use tensorflow (using version 2.4.1), I read up on the tf.data API and came to the conclusion that I should do this augmentation and reading of the image using the map function. However, even reading the images throws different errors. The following is a mix of the code I've tried to use, most lines for reading the images are commented out with an extra comment in the line above with the error message it will produce.
import tensorflow as tf
test = tf.data.experimental.make_csv_dataset("data/mini_formatted.csv", batch_size=4)
def map_fn(df_):
img_path = df_["img_id"]
masks = restore_masks(df_) # get maps from RLE with same shape as images
imgs = []
# has to be declared before loop with correct shape, used for reading imgs later
img = np.empty(shape=(256,1600,1), dtype=np.float32)
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
for i in img_path:
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
#img = cv2.imread(i, cv2.IMREAD_GRAYSCALE)
# produces AttributeError: 'NoneType' object has no attribute 'shape'
#img = cv2.imread(str(i), cv2.IMREAD_GRAYSCALE)
# produces ValueError: 'img' has shape (256, 1600, 1) before the loop, but shape <unknown> after one iteration. Use tf.autograph.experimental.set_loop_options to set shape invariants.
#img_file = tf.io.read_file(i)
#img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1)
#imgs.append(img)
pass
# since img_path is a list, this doesn't work either
# ValueError: Shape must be rank 0 but is rank 1 for '{{node ReadFile}} = ReadFile[](args_6)' with input shapes: [4].
img_file = tf.io.read_file(img_path)
img = tf.io.decode_image(img_file, dtype=tf.float32)
##########################################
#
# DO AUGMENTING PER BATCH HERE
#
##########################################
# return augmented images and masks
return imgs, class_masks
proc_ds = test.map(map_fn)
As you can see, reading the image throws different errors I do not quite unterstand, especially because reading the image as follows (i.e. with the exact same commands after getting the first batch from the dataset without applying the map function) works without problems.
it = test.as_numpy_iterator()
x_proc = it.next()
img_files = [tf.io.read_file(i) for i in x_proc["img_id"]]
imgs = [img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1) for img_file in img_files]
From my understanding, using the map function on a dataset should execute the code on each example once per epoch, but from the example given, it seems the function is executed once per batch, what I tried to work around. This doesn't explain to me, why the same code doesn't work inside the map function, while working fine outside it.
To help understand what I want to do, I've written a short Dataset/DataLoader in torch as an example of what my desired outputs are.
import torch
import pandas as pd
class MyDataset(torch.utils.data.Dataset):
def __init__(self, df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True):
self.df = df
self.length = len(df)
self.mode = mode
self.shuffle = shuffle
self.augment = augment
self.union = union
self.greyscale = greyscale
self.normalize = normalize
def __len__(self):
return self.length
def __getitem__(self, idx_):
# gets called for a single item when added to batch -> one line of the dataframe
# in the tf example, these are grouped in an OrderedDict with arrays of length (BATCH_SIZE) as values
df_ = self.df.loc[idx_]
img = self._load_img(df_["img_id"])
if self.union:
masks = build_masks(df_["c1":"c_all"], union_only=True)
else:
masks = build_masks(df_["c1":"c_all"])
# could also add augmentation here instead of in collate_ds
if self.mode == "train":
return {"img": img, "masks": masks}
return {"img": img, "masks": None}
def _load_img(self, img_path):
if self.greyscale:
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
else:
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.normalize:
img = img.astype(np.float32) / 255.
else:
img = img.astype(np.float32)
return img
def collate_ds(self, batch):
# gets called with BATCH_SIZE examples that were processed using __getitem__
imgs = [d["img"] for d in batch]
masks = [d["masks"] for d in batch]
if self.augment:
# augmentation steps for each image
pass
imgs = torch.tensor(imgs, dtype=torch.float32)
masks = torch.tensor(masks, dtype=torch.float32)
res = (imgs, masks)
return res
mini_df = pd.read_csv("data/mini_formatted.csv", index_col=0)
torch_ds = MyDataset(mini_df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True)
dataloader = torch.utils.data.DataLoader(torch_ds, batch_size=8, shuffle=True,
collate_fn=torch_ds.collate_ds)
batch = next(iter(dataloader))
print(batch[0].shape, batch[1].shape)
# output: (torch.Size([8, 256, 1600, 3]), torch.Size([8, 256, 1600, 5]))
I still don't understand, why even reading the images inside the map function doesn't work (e.g. using cv2 -> neither using imread(img_path) #TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename' nor imread(str(i) #AttributeError: 'NoneType' object has no attribute 'shape' -> image wasn't found works, while the tf.io.* functions work outside the function, but throw errors when the exact same code is executed inside it.
I would be very thankful for any help on what I'm misunderstanding/doing wrong using the map function with the tf.data API and how I could achieve the same results as the provided torch dataloader using the tf.data API.

Converting two Numpy data sets into a particularr PyTorch data set

I want to play around with a neural network that recognizes handwritten numbers. I found some of these on the web which use PyTorch, however they seem to download the data from the MNIST website in a particular format. My data is, however, available as follows:
with np.load('prediction-challenge-01-data.npz') as fh:
data_x = fh['data_x']
data_y = fh['data_y']
Where data_x is the training data and data_y are the labels of the pictures. I want these data sets to be in the same format as trainloader as shown below:
trainset = datasets.MNIST('/data/mnist', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
Where trainloader already has the training set data_x and labels data_y together in one set.
Is there any way to do this?
Edit: Shapes of data_x and data_y:
In [1]: data_x.shape
Out[2]: (20000, 1, 28, 28)
In [5]: data_y.shape
Out[7]: (20000,)
You can easily create your own dataset. Just inherit from torch.utils.data.Dataset and implement
__getitem__ at the very least:
Here is a quick and dirty example to get you going:
class YourOwnDataset(torch.utils.data.Dataset):
def __init__(self, input_file_path, transformations) :
super().__init__()
self.path = input_file_path
self.transforms = transformations
with np.load(self.path) as fh:
# I assume fh['data_x'] is a list you get the idea
self.data = fh['data_x']
self.labels = fh['data_y']
# in getitem, we retrieve one item based on the input index
def __getitem__(self, index):
data = self.data[index]
# based on the loss you chose and what you have in mind,
# you can transform you label, here I assume they are
# integer numbers (like, 1, 3, etc as labels used for classification)
label = self.labels[index]
img = convert/reshape your data into img
img = self.transforms(img)
return img, labels
def __len__(self):
return len(self.data)
and you can create your dataset like :
from torchvision import transforms
# add any number of transformations you like, I just added ToTensor()
transformations = transforms.Compose([transforms.ToTensor()])
trainset = YourOwnDataset('prediction-challenge-01-data.npz', transformations )
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

How can I properly get my Dataset to create?

I have the following code:
imagepaths = tf.convert_to_tensor(imagepaths, dtype=tf.string)
labels = tf.convert_to_tensor(labels, dtype=tf.int32)
# Build a TF Queue, shuffle data
image, label = tf.data.Dataset.from_tensor_slices((imagepaths, labels))
and am getting the following error:
image, label = tf.data.Dataset.from_tensor_slices((imagepaths, labels))
ValueError: too many values to unpack (expected 2)
Shouldn't Dataset.from_tensor_slices see this as the length of the tensor, not the number of inputs? How can I fix this issue or combine the data tensors into the same variable more effectively?
Just for reference:
There are 1800 imagepaths and 1800 labels corresponding to each other. And to be clear, the imagepaths are paths to the files where the jpgs images are located. My goal after this is to shuffle the data set and build the neural network model.
That code is right here:
# Read images from disk
image = tf.read_file(image)
image = tf.image.decode_jpeg(image, channels=CHANNELS)
# Resize images to a common size
image = tf.image.resize_images(image, [IMG_HEIGHT, IMG_WIDTH])
# Normalize
image = image * 1.0/127.5 - 1.0
# Create batches
X, Y = tf.train.batch([image, label], batch_size=batch_size,
capacity=batch_size * 8,
num_threads=4)
try to do this:
def transform(entry):
img = entry[0]
lbl = entry[1]
return img, lbl
raw_data = list(zip(imagepaths, labels))
dataset = tf.data.Dataset.from_tensor_slices(raw_data)
dataset = dataset.map(transform)
and if you want to have a look at your dataset you can do it like this:
for e in dataset.take(1):
print(e)
you can add multiple map functions and you can after that use shuffle and batch on your dataset to prepare it for training ;)

How to load images with multiple JSON annotation in PyTorch

I would like to know how I can use the data loader in PyTorch for the custom file structure of mine. I have gone through PyTorch documentation, but all those are with separate folders with class.
My folder structure consists of 2 folders(called training and validation), each with 2 subfolders(called images and json_annotations). Each image in the "images" folder has multiple objects(like cars, cycles, man etc) and each is annotated and have separate JSON files. Standard coco annotation is followed. My intention is to make a neural network which can do real-time classification from videos.
Edit 1:
I have done the coding as suggested by Fábio Perez.
class lDataSet(data.Dataset):
def __init__(self, path_to_imgs, path_to_json):
self.path_to_imgs = path_to_imgs
self.path_to_json = path_to_json
self.img_ids = os.listdir(path_to_imgs)
def __getitem__(self, idx):
img_id = self.img_ids[idx]
img_id = os.path.splitext(img_id)[0]
img = cv2.imread(os.path.join(self.path_to_imgs, img_id + ".jpg"))
load_json = json.load(open(os.path.join(self.path_to_json, img_id + ".json")))
#n = len(load_json)
#bboxes = load_json['annotation'][n]['segmentation']
return img, load_json
def __len__(self):
return len(self.image_ids)
When I try this
l_data = lDataSet(path_to_imgs = '/home/training/images', path_to_json = '/home/training/json_annotations')
I'm getting l_data with l_data[][0] - images and l_data with json. Now I'm confused. How will I use it with finetuning example availalbe in PyTorch? In that example, dataset and dataloader is done as shown below.
https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}
You should be able to implement your own dataset with data.Dataset. You just need to implement __len__ and __getitem__ methods.
In your case, you can iterate through all images in the image folder (then you can store the image ids in a list in your Dataset). Then, you use the index passed to __getitem__ to get the corresponding image id. With this image id, you can read the corresponding JSON file and return the target data that you need.
Something like this:
class YourDataLoader(data.Dataset):
def __init__(self, path_to_imgs, path_to_json):
self.path_to_imags = path_to_imgs
self.path_to_json = path_to_json
self.image_ids = iterate_through_images(path_to_images)
def __getitem__(self, idx):
img_id = self.image_ids[idx]
img = load_image(os.path.join(self.path_to_images, img_id)
bboxes = load_bboxes(os.path.join(self.path_to_json, img_id)
return img, bboxes
def __len__(self):
return len(self.image_ids)
In iterate_through_images you get all the ids (e.g. filenames) of images in a directory.
In load_bboxes you read the JSON and get the information you need.
I have a JSON loader implementation here if you want a reference.

Categories

Resources