image captioner generator method from single image to batch - python

i was following the tensorflow guide on image captioning linked here and everything is working great but i wanted to to convert this method that generates captions for input image to take a batch of images instead of 1
for example this the current generator method
#Captioner.add_method
def simple_gen(self, image, temperature=1):
initial = self.word_to_index([['[ٍSTART]']]) # (batch, sequence)
img_features = self.feature_extractor(image[tf.newaxis, ...])
tokens = initial # (batch, sequence)
for n in range(50):
preds = self((img_features, tokens)).numpy() # (batch, sequence, vocab)
preds = preds[:,-1, :] #(batch, vocab)
if temperature==0:
next = tf.argmax(preds, axis=-1)[:, tf.newaxis] # (batch, 1)
else:
next = tf.random.categorical(preds/temperature, num_samples=1) # (batch, 1)
tokens = tf.concat([tokens, next], axis=1) # (batch, sequence)
if next[0] == self.word_to_index('[END]'):
break
words = idx_to_word(tokens[0, 1:-1])
result = tf.strings.reduce_join(words, axis=-1, separator=' ')
return result.numpy().decode()
it takes one image output loaded by this function
def load_img(img_path):
img = tf.io.read_file(img_path)
img = tf.io.decode_jpeg(img,channels=3)
img = tf.image.resize(img,IMAGE_SHAPE[:-1])
return img
and load_img function takes img_path and the generator function returns generated caption for this image
what i tried is i have a tf dataset that contains a list img paths and corresponding captions i tried the following code to load all images in the tf dataset and loop over them and call the simple_gen method but it's very slow and inefficient and i'm looking for a better way to optimize the method
for (img,capt) in test_raw.map(lambda img,capt: (load_img(img),capt)):
preds = []
for t in [0.0,0.5,1.0]:
result = model.simple_gen(img)
preds.append(result)

Related

Getting Error: TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple

I am working on a CNN multi-class classification of different concentrations (10uM, 30uM, etc.) I create my dataset to include the images as the features and the concentrations as labels. Note that the concentrations are left as a string. When running the code, I am getting the following error:
TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not tuple
The following is my dataset class:
class CustomDataset(Dataset):
def __init__(self, path, method):
"""
Args:
csv_path (string): path to csv file
data_path (string): path to the folder where images are
transform: pytorch transforms for transforms and tensor conversion
"""
# Transforms
self.to_tensor = transforms.ToTensor()
# Read the excel file
self.data_path = pd.read_excel(path, sheet_name=method)
# First column contains the image paths
self.img_arr = np.asarray(self.data_path.iloc[:, 0])
# Second column is the labels
self.label_arr = np.asarray(self.data_path.iloc[:, 1])
def __getitem__(self, index):
# Get image name from the pandas df
img_path = self.img_arr[index]
# Open image
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Converts the image from BGR to RGB
# Transform image to tensor
img_tensor = self.to_tensor(img)
# Get label(class) of the image based on the cropped pandas column
img_label = self.to_tensor(self.label_arr[index])
img_label = self.label_arr[index]
return (img_tensor, img_label)
def __len__(self):
return len(self.data_path)
I am aware that the reason is most probably due to the fact that the labels are left as tuples, so the loss function is unable to compare the CNN output with the label. However, I am unable to find any resources that explain how labels are dealt with in multi-class classifications of tuple type labels. The solution seems simple, but I am a bit confused on how to solve it. Can anyone direct me?
EDIT: This is the implemented training loop:
def train_epoch(model,dataloader,loss_fn,optimizer):
train_loss,train_correct = 0.0, 0
model.train() #Sets the mode to train (Helpful when using layers such as DropOut and BatchNorm)
for features,labels in dataloader:
#Zero grad
optimizer.zero_grad()
#Forward Pass
output=model(features)
print(output)
print(labels)
loss=loss_fn(output,labels)
#Backward Pass
loss.backward()
optimizer.step()
train_loss += loss.item()*features.size(0) #features.size is useful when using batches.
scores, predictions = torch.max(output.data,1) # 1 is to create a 1 dimensional tensor with max values from each row
train_correct += (predictions==labels).sum().item()
return train_loss, train_correct
This is the output of "output" and "labels", respectively:
tensor([[-0.0528, -0.0150, -0.0153, -0.0939, -0.0887, -0.0863]],
grad_fn=<AddmmBackward0>)
('70uM',)

Problem reading and augmenting images in tf.data API using CSV / pandas DataFrames

I'm trying to (pre)process and augment my data and target variables when reading in the data each epoch/batch using the tf.data API. My unprocessed data is a CSV/pandas DataFrame with the format
index, img_id, c1, ..., c5 where img_id contains the path to an image while c1,...,c5 are run length encodings of different defects in the image, both are strings. To increase the amount of data I want to augment (e.g. flip) the images (and therefore the masks of defects aswell) with a certain probability for each image when reading it each batch/epoch. I want to read each image from my drive to save memory and because this seems to still yield good performance within the API (due to prefetching etc).
I'm familiar doing this using pytorchs DataLoader API (using version 1.8.1+cu111), but as this is for a course where I have to use tensorflow (using version 2.4.1), I read up on the tf.data API and came to the conclusion that I should do this augmentation and reading of the image using the map function. However, even reading the images throws different errors. The following is a mix of the code I've tried to use, most lines for reading the images are commented out with an extra comment in the line above with the error message it will produce.
import tensorflow as tf
test = tf.data.experimental.make_csv_dataset("data/mini_formatted.csv", batch_size=4)
def map_fn(df_):
img_path = df_["img_id"]
masks = restore_masks(df_) # get maps from RLE with same shape as images
imgs = []
# has to be declared before loop with correct shape, used for reading imgs later
img = np.empty(shape=(256,1600,1), dtype=np.float32)
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
for i in img_path:
# produces TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename'
#img = cv2.imread(i, cv2.IMREAD_GRAYSCALE)
# produces AttributeError: 'NoneType' object has no attribute 'shape'
#img = cv2.imread(str(i), cv2.IMREAD_GRAYSCALE)
# produces ValueError: 'img' has shape (256, 1600, 1) before the loop, but shape <unknown> after one iteration. Use tf.autograph.experimental.set_loop_options to set shape invariants.
#img_file = tf.io.read_file(i)
#img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1)
#imgs.append(img)
pass
# since img_path is a list, this doesn't work either
# ValueError: Shape must be rank 0 but is rank 1 for '{{node ReadFile}} = ReadFile[](args_6)' with input shapes: [4].
img_file = tf.io.read_file(img_path)
img = tf.io.decode_image(img_file, dtype=tf.float32)
##########################################
#
# DO AUGMENTING PER BATCH HERE
#
##########################################
# return augmented images and masks
return imgs, class_masks
proc_ds = test.map(map_fn)
As you can see, reading the image throws different errors I do not quite unterstand, especially because reading the image as follows (i.e. with the exact same commands after getting the first batch from the dataset without applying the map function) works without problems.
it = test.as_numpy_iterator()
x_proc = it.next()
img_files = [tf.io.read_file(i) for i in x_proc["img_id"]]
imgs = [img = tf.io.decode_image(img_file, dtype=tf.float32, channels=1) for img_file in img_files]
From my understanding, using the map function on a dataset should execute the code on each example once per epoch, but from the example given, it seems the function is executed once per batch, what I tried to work around. This doesn't explain to me, why the same code doesn't work inside the map function, while working fine outside it.
To help understand what I want to do, I've written a short Dataset/DataLoader in torch as an example of what my desired outputs are.
import torch
import pandas as pd
class MyDataset(torch.utils.data.Dataset):
def __init__(self, df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True):
self.df = df
self.length = len(df)
self.mode = mode
self.shuffle = shuffle
self.augment = augment
self.union = union
self.greyscale = greyscale
self.normalize = normalize
def __len__(self):
return self.length
def __getitem__(self, idx_):
# gets called for a single item when added to batch -> one line of the dataframe
# in the tf example, these are grouped in an OrderedDict with arrays of length (BATCH_SIZE) as values
df_ = self.df.loc[idx_]
img = self._load_img(df_["img_id"])
if self.union:
masks = build_masks(df_["c1":"c_all"], union_only=True)
else:
masks = build_masks(df_["c1":"c_all"])
# could also add augmentation here instead of in collate_ds
if self.mode == "train":
return {"img": img, "masks": masks}
return {"img": img, "masks": None}
def _load_img(self, img_path):
if self.greyscale:
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
else:
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.normalize:
img = img.astype(np.float32) / 255.
else:
img = img.astype(np.float32)
return img
def collate_ds(self, batch):
# gets called with BATCH_SIZE examples that were processed using __getitem__
imgs = [d["img"] for d in batch]
masks = [d["masks"] for d in batch]
if self.augment:
# augmentation steps for each image
pass
imgs = torch.tensor(imgs, dtype=torch.float32)
masks = torch.tensor(masks, dtype=torch.float32)
res = (imgs, masks)
return res
mini_df = pd.read_csv("data/mini_formatted.csv", index_col=0)
torch_ds = MyDataset(mini_df, mode="train", shuffle=True, augment=False, union=False,
greyscale=False, normalize=True)
dataloader = torch.utils.data.DataLoader(torch_ds, batch_size=8, shuffle=True,
collate_fn=torch_ds.collate_ds)
batch = next(iter(dataloader))
print(batch[0].shape, batch[1].shape)
# output: (torch.Size([8, 256, 1600, 3]), torch.Size([8, 256, 1600, 5]))
I still don't understand, why even reading the images inside the map function doesn't work (e.g. using cv2 -> neither using imread(img_path) #TypeError: Can't convert object of type 'Tensor' to 'str' for 'filename' nor imread(str(i) #AttributeError: 'NoneType' object has no attribute 'shape' -> image wasn't found works, while the tf.io.* functions work outside the function, but throw errors when the exact same code is executed inside it.
I would be very thankful for any help on what I'm misunderstanding/doing wrong using the map function with the tf.data API and how I could achieve the same results as the provided torch dataloader using the tf.data API.

how to create train, test & validation split of tf.data.Dataset in tf 2.1.0

the following code is copied from :
https://www.tensorflow.org/tutorials/load_data/images
the code aims to create dataset of images downloaded from the web and stored into folders depending upon their classes, please do refer to the link above for the whole context!
list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))
for f in list_ds.take(5):
print(f.numpy())
def get_label(file_path):
# convert the path to a list of path components
parts = tf.strings.split(file_path, os.path.sep)
# The second to last is the class-directory
return parts[-2] == CLASS_NAMES
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])
def process_path(file_path):
label = get_label(file_path)
# load the raw data from the file as a string
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)
for image, label in labeled_ds.take(1):
print("Image shape: ", image.numpy().shape)
print("Label: ", label.numpy())
def prepare_for_training(ds, cache=True, shuffle_buffer_size=1000):
# This is a small dataset, only load it once, and keep it in memory.
# use `.cache(filename)` to cache preprocessing work for datasets that don't
# fit in memory.
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
train_ds = prepare_for_training(labeled_ds)
we are finally left with train_ds that is a PreffetchDataset object and contains the entire dataset of images, labels!
How to split train_ds into train, test & validation sets to feed it into a model?
After the ds.repeat() call the dataset is infinite and splitting an infinte dataset doesn't work very well. Therefore you should split it before the prepare_training() call. Like this:
labeled_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)
labeled_ds = labeled_ds.shuffle(10000).batch(BATCH_SIZE)
# Size of dataset
n = sum(1 for _ in labeled_ds)
n_train = int(n * 0.8)
n_valid = int(n * 0.1)
n_test = n - n_train - n_valid
train_ds = labeled_ds.take(n_train)
valid_ds = labeled_ds.skip(n_train).take(n_valid)
test_ds = labeled_ds.skip(n_train + n_valid).take(n_test)
The line n = sum(1 for _ in labeled_ds) iterates through the dataset once to get its size, then it is 3-way split into 80%/10%/10%.

How do I create image sequence samples using tf.data?

I want to create image sequence samples using the tf.data API. But as of now, it seems like there is no easy way to concatenate multiple images to form a single sample. I have tried to use the dataset.window function, which groups my images right. But I don't know how to concatenate them.
import tensorflow as tf
from glob import glob
IMG_WIDTH = 256
IMG_HEIGHT = 256
def load_and_process_image(path):
img = tf.io.read_file(path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])
img = tf.reshape(img, shape=(IMG_WIDTH, IMG_HEIGHT, 1, 3))
return img
def create_dataset(files, time_distance=8, frame_step=1):
dataset = tf.data.Dataset.from_tensor_slices(files)
dataset = dataset.map(load_and_process_image)
dataset = dataset.window(time_distance, 1, frame_step, True)
# TODO: Concatenate elements from dataset.window
return dataset
files = sorted(glob('some/path/*.jpg'))
images = create_dataset(images)
I know that I could save my image sequences as TFRecords but that would make my data pipeline much more unflexible and would cost tons of memory.
My input batches should have the form N x W x H x T x C
(N: Number of samples
W: Image Width
H: Image Height
T: Image Sequence length
C: Image Channels).
You can use batching to create batches of size N.
iterations = #
batched_dataset = dataset.batch(N)
for batch in batched_dataset.take(iterations):
# process your batch
Here iterations is the number of batches you want to generate.

resize_images with a batch

I'm trying to read three jpg-Files to resize them with a tensorflow batch. No matter what I tried I didn't succeed. Here is one example below. In general how can I resize some pictures in a batch with tf.image.resize_images. I don't want to use an Input Reader. I want to create the batch of some pictures by myself.
I think it's neccessary to have 4 dimensions like batchsize, width, heigt, channels
import numpy as np
import tensorflow as tf
sess = tf.Session()
tensor_list = []
for i in range(3):
img = tf.read_file("{0}.jpg".format(i))
img_tensor = tf.image.decode_jpeg(img, 3)
img_resized = tf.image.resize_images(img_tensor, tf.convert_to_tensor([ 800, 400 ] ), tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img_tensor_dim = tf.expand_dims(img_resized, 0)
tensor_list.append(img_tensor_dim)
batch = tf.train.batch(tensor_list, batch_size=3, enqueue_many=False)
img_resized = tf.image.resize_images(batch, tf.convert_to_tensor([400, 200]), tf.image.ResizeMethod.NEAREST_NEIGHBOR)
for i in range(3):
tmp = img_resized[i]
endcode_jpg = tf.image.encode_jpeg(tmp, x_density=96, y_density=96)
wr = tf.write_file('{0}_out.jpg'.format(i), endcode_jpg)
sess.run(wr)
You can use the tf.map_fn() operation to apply the resizing logic to a vector of strings containing your image data:
import tensorflow as tf
# Build a tensor containing the image data as a vector of strings.
images = []
for i in range(3):
images.append(tf.read_file("/tmp/jpeg420exif.jpg"))
images = tf.stack(images)
# `resize_fn()` contains the logic for resizing and encoding one image.
def resize_fn(img):
img_tensor = tf.image.decode_jpeg(img, 3)
img_resized = tf.image.resize_images(
[img_tensor], [800, 400], tf.image.ResizeMethod.NEAREST_NEIGHBOR)[0]
img_encoded = tf.image.encode_jpeg(img_resized, x_density=96, y_density=96)
return img_encoded
# `tf.map_fn()` applies `resize_fn()` to each image in turn, and
# returns a vector of encoded images.
encoded_images = tf.map_fn(resize_fn, images)
write_ops = []
for i in range(3):
write_ops.append(tf.write_file("{0}_out.jpg".format(i), encoded_images[i]))
with tf.Session() as sess:
sess.run(write_ops)

Categories

Resources