Getting Error: TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple - python

I am working on a CNN multi-class classification of different concentrations (10uM, 30uM, etc.) I create my dataset to include the images as the features and the concentrations as labels. Note that the concentrations are left as a string. When running the code, I am getting the following error:
TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple
The following is my dataset class:
class CustomDataset(Dataset):
def __init__(self, path, method):
"""
Args:
csv_path (string): path to csv file
data_path (string): path to the folder where images are
transform: pytorch transforms for transforms and tensor conversion
"""
# Transforms
self.to_tensor = transforms.ToTensor()
# Read the excel file
self.data_path = pd.read_excel(path, sheet_name=method)
# First column contains the image paths
self.img_arr = np.asarray(self.data_path.iloc[:, 0])
# Second column is the labels
self.label_arr = np.asarray(self.data_path.iloc[:, 1])
def __getitem__(self, index):
# Get image name from the pandas df
img_path = self.img_arr[index]
# Open image
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Converts the image from BGR to RGB
# Transform image to tensor
img_tensor = self.to_tensor(img)
# Get label(class) of the image based on the cropped pandas column
img_label = self.to_tensor(self.label_arr[index])
img_label = self.label_arr[index]
return (img_tensor, img_label)
def __len__(self):
return len(self.data_path)
I am aware that the reason is most probably due to the fact that the labels are left as tuples, so the loss function is unable to compare the CNN output with the label. However, I am unable to find any resources that explain how labels are dealt with in multi-class classifications of tuple type labels. The solution seems simple, but I am a bit confused on how to solve it. Can anyone direct me?
EDIT: This is the implemented training loop:
def train_epoch(model,dataloader,loss_fn,optimizer):
train_loss,train_correct = 0.0, 0
model.train() #Sets the mode to train (Helpful when using layers such as DropOut and BatchNorm)
for features,labels in dataloader:
#Zero grad
optimizer.zero_grad()
#Forward Pass
output=model(features)
print(output)
print(labels)
loss=loss_fn(output,labels)
#Backward Pass
loss.backward()
optimizer.step()
train_loss += loss.item()*features.size(0) #features.size is useful when using batches.
scores, predictions = torch.max(output.data,1) # 1 is to create a 1 dimensional tensor with max values from each row
train_correct += (predictions==labels).sum().item()
return train_loss, train_correct
This is the output of "output" and "labels", respectively:
tensor([[-0.0528, -0.0150, -0.0153, -0.0939, -0.0887, -0.0863]],
grad_fn=<AddmmBackward0>)
('70uM',)

Related

image captioner generator method from single image to batch

i was following the tensorflow guide on image captioning linked here and everything is working great but i wanted to to convert this method that generates captions for input image to take a batch of images instead of 1
for example this the current generator method
#Captioner.add_method
def simple_gen(self, image, temperature=1):
initial = self.word_to_index([['[ٍSTART]']]) # (batch, sequence)
img_features = self.feature_extractor(image[tf.newaxis, ...])
tokens = initial # (batch, sequence)
for n in range(50):
preds = self((img_features, tokens)).numpy() # (batch, sequence, vocab)
preds = preds[:,-1, :] #(batch, vocab)
if temperature==0:
next = tf.argmax(preds, axis=-1)[:, tf.newaxis] # (batch, 1)
else:
next = tf.random.categorical(preds/temperature, num_samples=1) # (batch, 1)
tokens = tf.concat([tokens, next], axis=1) # (batch, sequence)
if next[0] == self.word_to_index('[END]'):
break
words = idx_to_word(tokens[0, 1:-1])
result = tf.strings.reduce_join(words, axis=-1, separator=' ')
return result.numpy().decode()
it takes one image output loaded by this function
def load_img(img_path):
img = tf.io.read_file(img_path)
img = tf.io.decode_jpeg(img,channels=3)
img = tf.image.resize(img,IMAGE_SHAPE[:-1])
return img
and load_img function takes img_path and the generator function returns generated caption for this image
what i tried is i have a tf dataset that contains a list img paths and corresponding captions i tried the following code to load all images in the tf dataset and loop over them and call the simple_gen method but it's very slow and inefficient and i'm looking for a better way to optimize the method
for (img,capt) in test_raw.map(lambda img,capt: (load_img(img),capt)):
preds = []
for t in [0.0,0.5,1.0]:
result = model.simple_gen(img)
preds.append(result)

Problem reading and augmenting images in tf.data API using CSV / pandas DataFrames

I'm trying to (pre)process and augment my data and target variables when reading in the data each epoch/batch using the tf.data API. My unprocessed data is a CSV/pandas DataFrame with the format
index, img_id, c1, ..., c5 where img_id contains the path to an image while c1,...,c5 are run length encodings of different defects in the image, both are strings. To increase the amount of data I want to augment (e.g. flip) the images (and therefore the masks of defects aswell) with a certain probability for each image when reading it each batch/epoch. I want to read each image from my drive to save memory and because this seems to still yield good performance within the API (due to prefetching etc).
I'm familiar doing this using pytorchs DataLoader API (using version 1.8.1+cu111), but as this is for a course where I have to use tensorflow (using version 2.4.1), I read up on the tf.data API and came to the conclusion that I should do this augmentation and reading of the image using the map function. However, even reading the images throws different errors. The following is a mix of the code I've tried to use, most lines for reading the images are commented out with an extra comment in the line above with the error message it will produce.
import tensorflow as tf
test = tf.data.experimental.make_csv_dataset("data/mini_formatted.csv", batch_size=4)
def map_fn(df_):
img_path = df_["img_id"]
masks = restore_masks(df_) # get maps from RLE with same shape as images
imgs = []
# has to be declared before loop with correct shape, used for reading imgs later
img = np.empty(shape=(256,1600,1), dtype=np.float32)
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
for i in img_path:
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
#img = cv2.imread(i, cv2.IMREAD_GRAYSCALE)
# produces AttributeError: 'NoneType' object has no attribute 'shape'
#img = cv2.imread(str(i), cv2.IMREAD_GRAYSCALE)
# produces ValueError: 'img' has shape (256, 1600, 1) before the loop, but shape <unknown> after one iteration. Use tf.autograph.experimental.set_loop_options to set shape invariants.
#img_file = tf.io.read_file(i)
#img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1)
#imgs.append(img)
pass
# since img_path is a list, this doesn't work either
# ValueError: Shape must be rank 0 but is rank 1 for '{{node ReadFile}} = ReadFile[](args_6)' with input shapes: [4].
img_file = tf.io.read_file(img_path)
img = tf.io.decode_image(img_file, dtype=tf.float32)
##########################################
#
# DO AUGMENTING PER BATCH HERE
#
##########################################
# return augmented images and masks
return imgs, class_masks
proc_ds = test.map(map_fn)
As you can see, reading the image throws different errors I do not quite unterstand, especially because reading the image as follows (i.e. with the exact same commands after getting the first batch from the dataset without applying the map function) works without problems.
it = test.as_numpy_iterator()
x_proc = it.next()
img_files = [tf.io.read_file(i) for i in x_proc["img_id"]]
imgs = [img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1) for img_file in img_files]
From my understanding, using the map function on a dataset should execute the code on each example once per epoch, but from the example given, it seems the function is executed once per batch, what I tried to work around. This doesn't explain to me, why the same code doesn't work inside the map function, while working fine outside it.
To help understand what I want to do, I've written a short Dataset/DataLoader in torch as an example of what my desired outputs are.
import torch
import pandas as pd
class MyDataset(torch.utils.data.Dataset):
def __init__(self, df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True):
self.df = df
self.length = len(df)
self.mode = mode
self.shuffle = shuffle
self.augment = augment
self.union = union
self.greyscale = greyscale
self.normalize = normalize
def __len__(self):
return self.length
def __getitem__(self, idx_):
# gets called for a single item when added to batch -> one line of the dataframe
# in the tf example, these are grouped in an OrderedDict with arrays of length (BATCH_SIZE) as values
df_ = self.df.loc[idx_]
img = self._load_img(df_["img_id"])
if self.union:
masks = build_masks(df_["c1":"c_all"], union_only=True)
else:
masks = build_masks(df_["c1":"c_all"])
# could also add augmentation here instead of in collate_ds
if self.mode == "train":
return {"img": img, "masks": masks}
return {"img": img, "masks": None}
def _load_img(self, img_path):
if self.greyscale:
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
else:
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.normalize:
img = img.astype(np.float32) / 255.
else:
img = img.astype(np.float32)
return img
def collate_ds(self, batch):
# gets called with BATCH_SIZE examples that were processed using __getitem__
imgs = [d["img"] for d in batch]
masks = [d["masks"] for d in batch]
if self.augment:
# augmentation steps for each image
pass
imgs = torch.tensor(imgs, dtype=torch.float32)
masks = torch.tensor(masks, dtype=torch.float32)
res = (imgs, masks)
return res
mini_df = pd.read_csv("data/mini_formatted.csv", index_col=0)
torch_ds = MyDataset(mini_df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True)
dataloader = torch.utils.data.DataLoader(torch_ds, batch_size=8, shuffle=True,
collate_fn=torch_ds.collate_ds)
batch = next(iter(dataloader))
print(batch[0].shape, batch[1].shape)
# output: (torch.Size([8, 256, 1600, 3]), torch.Size([8, 256, 1600, 5]))
I still don't understand, why even reading the images inside the map function doesn't work (e.g. using cv2 -> neither using imread(img_path) #TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename' nor imread(str(i) #AttributeError: 'NoneType' object has no attribute 'shape' -> image wasn't found works, while the tf.io.* functions work outside the function, but throw errors when the exact same code is executed inside it.
I would be very thankful for any help on what I'm misunderstanding/doing wrong using the map function with the tf.data API and how I could achieve the same results as the provided torch dataloader using the tf.data API.

How can I properly get my Dataset to create?

I have the following code:
imagepaths = tf.convert_to_tensor(imagepaths, dtype=tf.string)
labels = tf.convert_to_tensor(labels, dtype=tf.int32)
# Build a TF Queue, shuffle data
image, label = tf.data.Dataset.from_tensor_slices((imagepaths, labels))
and am getting the following error:
image, label = tf.data.Dataset.from_tensor_slices((imagepaths, labels))
ValueError: too many values to unpack (expected 2)
Shouldn't Dataset.from_tensor_slices see this as the length of the tensor, not the number of inputs? How can I fix this issue or combine the data tensors into the same variable more effectively?
Just for reference:
There are 1800 imagepaths and 1800 labels corresponding to each other. And to be clear, the imagepaths are paths to the files where the jpgs images are located. My goal after this is to shuffle the data set and build the neural network model.
That code is right here:
# Read images from disk
image = tf.read_file(image)
image = tf.image.decode_jpeg(image, channels=CHANNELS)
# Resize images to a common size
image = tf.image.resize_images(image, [IMG_HEIGHT, IMG_WIDTH])
# Normalize
image = image * 1.0/127.5 - 1.0
# Create batches
X, Y = tf.train.batch([image, label], batch_size=batch_size,
capacity=batch_size * 8,
num_threads=4)
try to do this:
def transform(entry):
img = entry[0]
lbl = entry[1]
return img, lbl
raw_data = list(zip(imagepaths, labels))
dataset = tf.data.Dataset.from_tensor_slices(raw_data)
dataset = dataset.map(transform)
and if you want to have a look at your dataset you can do it like this:
for e in dataset.take(1):
print(e)
you can add multiple map functions and you can after that use shuffle and batch on your dataset to prepare it for training ;)

Proper dataloader setup to train fasterrcnn-resnet50 for object detection with pytorch

I am trying to train pytorches torchvision.models.detection.fasterrcnn_resnet50_fpn to detect objects in my own images.
According to the documentation, this model expects a list of images and a list of dictionaries with
'boxes' and 'labels' as keys. So my dataloaders __getitem__() looks like this:
def __getitem__(self, idx):
# load images
_, img = self.images[idx].getImage()
img = Image.fromarray(img, mode='RGB')
objects = self.images[idx].objects
boxes = []
labels = []
for o in objects:
# append bbox to boxes
boxes.append([o.x, o.y, o.x+o.width, o.y+o.height])
# append the 4th char of class_id, the number of lights (1-4)
labels.append(int(str(o.class_id)[3]))
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
labels = torch.as_tensor(labels, dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
# transforms consists only of transforms.Compose([transforms.ToTensor()]) for the time being
if self.transforms is not None:
img = self.transforms(img)
return img, target
To my best knowledge, it returns exactly what's asked. My dataloader looks like this
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=4, shuffle=False, num_workers=2)
however, when it get's to this stage:
for images, targets in dataloaders[phase]:
it raises
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 12 and 7 in dimension 1 at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensor.cpp:689
Can someone point me in the right direction?
#jodag was right, I had to write a seperate collate function in order for the net to receive the data like it was supposed to. In my case I only needed to bypass the default function.

Tensorflow input function returns invalid values (Tensor instead of Tensor dict)

I have been working on standard image classification problem with Tensorflow. Most of the code is derived from tutorials on www.tensorflow.org, the only major change is that I use my own data.
The images are already processed, of same size, encoding etc., and sorted into appropriate folders called groupA and groupB. While most of the code worked without a problem (images get loaded from disk and decoded, labels are assigned etc.) I encountered an unexpected roadblock.
labels = tf.constant([1.0 if 'groupA' in filename else 0.1 for filename in training_data])
file_names = tf.constant(training_data)
dataset = tf.data.Dataset.from_tensor_slices((file_names, labels))
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image = tf.image.decode_png(filename)
return image, label
dataset = dataset.map(_parse_function)
def create_input_fn_train(dataset):
def input_fn():
ds = dataset.shuffle(buffer_size=10000)
ds = ds.batch(16)
ds = ds.repeat()
iterator = ds.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
return input_fn
input_fn_train = create_input_fn_train(dataset)
model = tf.estimator.DNNClassifier(feature_columns=[construct_feature_columns(150,150)],
hidden_units=[1024,100],
optimizer=tf.train.AdamOptimizer(1e-4),
n_classes=2,
dropout=0.1,
model_dir="./tmp/fon_model")
The input function is returning wrong type of data, causing following error.
ValueError("features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'>",)
I tried to look up possible solutions, leading me to tensorflow ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'> and tried the provided solution.
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image = tf.image.decode_png(filename)
features = {}
features['pixels'] = image
return features, label
But it gave me another error message, this time:
ValueError("Items of feature_columns must be a _FeatureColumn. Given (type <class 'set'>): {_NumericColumn(key='pixels', shape=(22500,), default_value=None, dtype=tf.float32, normalizer_fn=None)}.",)
Which leads me to believe that I made some sort of fundamental error. Either when parsing the data, assigning labels etc.
But I can't figure out where I went wrong.
EDIT:
I altered the code so that estimator constructor is
model = tf.estimator.DNNClassifier(feature_columns=construct_feature_columns(150,150),
hidden_units=[1024,100],
optimizer=tf.train.AdamOptimizer(1e-4),
n_classes=2,
dropout=0.1,
model_dir="./tmp/fon_model")
and the construct_feature_columns into:
def construct_feature_columns(image_height, image_width):
return set([tf.feature_column.numeric_column('pixels', shape=[image_height*image_width])])
Eliminating the ValueError:
ValueError("Items of feature_columns must be a _FeatureColumn. Given (type <class 'set'>): {_NumericColumn(key='pixels', shape=(22500,), default_value=None, dtype=tf.float32, normalizer_fn=None)}.",)
I also rewrote the input function into:
def input_fn():
ds = dataset.shuffle(buffer_size=len(training_data))
ds = ds.batch(16)
ds = ds.repeat()
iterator = ds.make_one_shot_iterator()
images, labels = iterator.get_next()
images = {'pixels':images}
return images, labels
However new error turned up, confirming my belief that I messed up something fundamental:
UnimplementedError (see above for traceback): Cast string to float is not supported
[[Node: dnn/input_from_feature_columns/input_layer/pixels/ToFloat = Cast[DstT=DT_FLOAT, SrcT=DT_STRING, _device="/job:localhost/replica:0/task:0/device:CPU:0"](dnn/input_from_feature_columns/input_layer/pixels/ExpandDims)]]

Categories

Resources