I have a classic dataset of images and labels.
Here is a simple representation of the __getitem__ function :
def __getitem__(self, index):
(img_path, label) = df.iloc[index].values
img = Image.open(img_path).convert("RGB")
y = torch.tensor(labels))
return (img, y)
I have :
dataset = ClassDataset()
train_set, validation_set = random_split(dataset)
train_loader = DataLoader(dataset=train_set)
The size of one batch of the train loader would be : [32,3,256,256]
With 32 being the batch size, 3 the number of channels and 256 the width and height of my image.
I want to modify the shape of one batch so that it is sequential [8,4,3,256,256] with 8 being the batch size and 4 the length of one sequence.
I know that it could be easily done with torch.view() or torch.reshape() knowing that my data are already in the right order (they can be grouped directly into sequences).
But I want to know where is the most intelligent place to make this change, in the dataset class, in the dataloader class or in the train loop.
I already tried passing sequences into the getitem :
(img_path, coords) = df.iloc[4*(index-1):4*index].values
(assuming that sequence length is 4), but it didn't work.
It is more relevant to do this kind of processing in the dataset layer. Indeed, what you are looking to implement there is "given a dataset index index return the corresponding input and its label". In your case you are dealing with a sequence as input, so something like this makes sense for your __getitem__ to return a sequence of images.
The data loader will automatically collate the data such that you get (batch_size, seq_len, channel, height, width) for your input, and (batch_size, seq_len) for your label (or (batch_size,) if there is meant to be a single label per sequence).
Related
For more robustnes of my model I want to normalize my feature tensor.
I tried doing it the way that is to the best of my knowledge standard for pictures:
class Dataset(torch.utils.data.Dataset):
'Characterizes a dataset for PyTorch'
def __init__(self, input_tensor, transform = transforms.Normalize(mean= 0.5, std=0.5)):
self.labels = input_tensor[:,:,-1]
self.features = input_tensor[:,:,:-1]
self.transform = transform
def __len__(self):
return self.labels_planned.shape[0]
def __getitem__(self, index):
# Load data and get label
X = self.features[index]
y = self.labelslabels[index]
if self.transform:
X = self.transform(X)
return X, y
But receive this error message:
ValueError: Expected tensor to be a tensor image of size (C, H, W). Got tensor.size() = torch.Size([8, 25]).
Everywhere I looked people suggest that one should use .view to generate the third dimension in order to comply with the standard shape of pictures, but this seems very odd to me. Is there maybe a cleaner way to do this.
Also where should I best place the normalization? Just for the batch or for the entire train dataset?
You are asking two different questions, I will try to answer both.
Indeed, you should first reshape to (c, h, w) where c is the channel dimension In most cases, you will need that extra dimension because most 'image' layers are built to receive 3d dimensional tensors - not counting the batch dimension - such as nn.Conv2d, BatchNorm2d, etc... I don't believe there's anyways around it, and doing so would restrict yourself to one-layer image datasets.
You can broadcast to the desired shape with torch.reshape or Tensor.view:
X = X.reshape(1, *X.shape)
Or by adding an additional dimension using torch.unsqueeeze:
X.unsqueeze(0)
About normalization. Batch-normalization and dataset-normalization are two different approaches.
The former is a technique that can achieve improved performance in convolution networks. This kind of operation can be implemented using a nn.BatchNorm2d layer and is done using learnable parameters: a scale factor (~ std) and a bias (~ mean). This type of normalization is applied when the model is called and is applied per-batch.
The latter is a pre-processing technique which allows making different features have the same scale. This normalization can be applied inside the dataset per-element. It requires you measure the mean and standard deviation of your training set.
I'm setting up a image data pipeline on Tensorflow 2.1. I'm using a dataset with RGB images of variable shapes (h, w, 3) and I can't find a way to make it work. I get the following error when I call tf.data.Dataset.batch() :
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [256,384,3] and element 3 had shape [160,240,3]
I found the padded_batch method but I don't want my images to be padded to the same shape.
EDIT:
I think that I found a little workaround to this by using the function tf.data.experimental.dense_to_ragged_batch (which convert the dense tensor representation to a ragged one).
Unlike tf.data.Dataset.batch, the input elements to be batched may have different shapes, and each batch will be encoded as a tf.RaggedTensor
But then I have another problem. My dataset contains images and their corresponding labels. When I use the function like this:
ds = ds.map(
lambda x: tf.data.experimental.dense_to_ragged_batch(batch_size)
)
I get the following error because it tries to map the function to the entire dataset (thus to images and labels), which is not possible because it can only be applied to a 1 single tensor (not 2).
TypeError: <lambda>() takes 1 positional argument but 2 were given
Is there a way to specify which element of the two I want the transformation to be applied to ?
I just hit the same problem. The solution turned out to be loading the data as 2 datasets and then using dataet.zip() to merge them.
images = dataset.map(parse_images, num_parallel_calls=tf.data.experimental.AUTOTUNE)
images = dataset_images.apply(
tf.data.experimental.dense_to_ragged_batch(batch_size=batch_size, drop_remainder=True))
dataset_total_cost = dataset.map(get_total_cost)
dataset_total_cost = dataset_total_cost.batch(batch_size, drop_remainder=True)
dataset = dataset.zip((dataset_images, dataset_total_cost))
If you do not want to resize your images, you can only use a batch size of 1 and not bigger than that. Thus you can train your model one image at at time. The error you reported clearly says that you are using a batch size bigger than 1 and trying to put two images of different shape/size in a batch. You could either resize your images to a fixed shape (or pad your images), or use batch size of 1 as follows:
my_data = tf.data.Dataset(....) # with whatever arguments you use here
my_data = my_data.batch(1)
I want to use CNN network to segment 2 objects (binary: "0: object not present, 1: object present") into shapes but I have an issue with data. The train data is 150 images and in "jpg" format and the ground truth (label data) is also 150 images of "png" rasters of 0 and 1 (resulting in black white images).
Now the question is how to load this hybrid of train images and label images in Keras/Tensorflow and if there`s a dummy example and/or demonstration on how to do that in Python, I would be grateful.
You can define one generator for reading the input images and another one for reading the labels using the ImageDataGenerator class and its flow_from_directory() method, and then combine these two generators in a single generator. Just make sure the directory structure and (order of) file names of input and label images are the same:
data_image_gen = ImageDataGenerator(...)
data_label_gen = ImageDataGenerator(...)
image_gen = data_image_gen.flow_from_directory(image_directory,
# no need to return labels
class_mode=None,
# don't shuffle to have the same order as labels
shuffle=False)
image_gen = data_image_gen.flow_from_directory(label_directory,
color_mode='grayscale',
# no need to return labels
class_mode=None,
# don't shuffle to have the same order as images
shuffle=False)
def final_gen(image_gen, label_gen):
for data, labels in zip(image_gen, label_gen):
# divide labels by 255 to make them like masks i.e. 0 and 1
labels /= 255.
# remove the last axis, i.e. (batch_size, n_rows, n_cols, 1) --> (batch_size, n_rows, n_cols)
labels = np.squeeze(labels, axis=-1)
yield data, labels
# ... define your model
# fit the model
model.fit_generator(final_gen(image_gen, label_gen), ...)
Long story short, I have an RNN that is stacked on top of a CNN.
The CNN was created and trained separately. To clarify things, let's suppose the CNN takes input in the form of a [BATCH SIZE, H, W, C] placeholder (H = height, W = width, C = number of channels).
Now, when stacked on top of the RNN, the overall input to the combined network will have the shape: [BATCH SIZE, TIME SEQUENCE, H, W, C], i.e. each sample in the minibatch consists of TIME_SEQUENCE many images. Moreover, the time sequences are variable in length. There is a separate placeholder called sequence_lengths with shape [BATCH SIZE] that contains scalar values corresponding to the length of each sample in the minibatch. The value of TIME SEQUENCE corresponds to the maximum possible time sequence length, and for samples with smaller lengths, the remaining values are padded with zeros.
What I want to do
I want to accumulate the output from the CNN in a tensor of shape [BATCH SIZE, TIME SEQUENCE, 1] (the last dimension just contains the final score output by the CNN for each time sample for each batch element) so that I can forward this entire chunk of information to the RNN that is stacked on top of the CNN. The tricky thing is, I also want to be able to back-propagate the error from the RNN to the CNN (the CNN is already pre-trained, but I would like to fine-tune the weights a bit), so I have to stay inside the graph, i.e. I can't make any calls to session.run().
Option A:
The easiest way would be to just reshape the overall network input tensor to [BATCH SIZE * TIME SEQUENCE, H, W, C]. The problem with this is that BATCH SIZE * TIME SEQUENCE may be as large as 2000, so I'm bound to run out of memory when trying to feed a batch that big into my CNN. And the batch size is too large for training anyway. Also, a lot of sequences are just padded zeros, and it'd be a waste of computation.
Option B:
Use the tf.while_loop. My idea was to treat all the images along the time axis for a single minibatch element as a minibatch for the CNN. Essentially, the CNn would be processing batches of size [TIME SEQUENCE, H, W, C] at each iteration (not exactly TIME SEQUENCE many images every time; the exact number would depend on the sequence length). The code I have right now looks like this:
# The output tensor that I want populated
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
# Counter for the loop. I'll process one batch element per iteration.
# One batch element contains a variable number of images for each time step. All these images will form a minibatch for the CNN.
loop_counter = tf.get_variable('loop_counter', dtype=tf.int32, initializer=0)
# Loop variables that will be passed to the body and cond methods
loop_vars = [input_image_sequence, sequence_lengths, image_output_sequence, loop_counter]
# input_image_sequence: [BATCH SIZE, TIME SEQUENCE, H, W, C]
# sequence_lengths: [BATCH SIZE]
# image_output_sequence: [BATCH SIZE, TIME SEQUENCE, 1]
# abbreviations for vars in loop_vars:
# iis --> input_image_sequence
# sl --> sequence_lengths
# ios --> image_output_sequence
# lc --> loop_counter
def cond(iis, sl, ios, lc):
return tf.less(lc, batch_size)
def body(iis, sl, ios, lc):
seq_len = sl[lc] # the sequence length of the current batch element
cnn_input_batch = iis[lc, :seq_len] # extract the relevant portion (the rest are just padded zeros)
# propagate this 'batch' through the CNN
my_cnn_model.process_input(cnn_input_batch)
# Pad the remaining indices
padding = [[0, 0], [0, batch_size - seq_len]]
padded_cnn_output = tf.pad(cnn_input_batch_features, paddings=padding, mode='CONSTANT', constant_values=0)
# The problematic part: assign these processed values to the output tensor
ios[lc].assign(padded_cnn_features)
return [iis, sl, ios, lc + 1]
_, _, result, _ = tf.while_loop(cond, body, loop_vars, swap_memory=True)
Inside my_cnn_model.process_input, I'm just passing the input through a vanilla CNN. All the variables created in it are with tf.AUTO_REUSE, so that should ensure that the while loop reuses the same weights for all the loop iterations.
The exact problem
image_output_sequence is a variable, but somehow when tf.while_loop calls the body method, it gets turned into a Tensor type object to which assignments can't be made. I get the error message: Sliced assignment is only supported for variables
This problem persists even if I use another format like using a tuple of BATCH SIZE Tensors each with dimensions [TIME SEQUENCE, H, W, C].
I'm open to a complete redesign of the code as well, as long as it gets the job done nicely.
The solution is to use an object of type TensorArray, which is specifically made to address such problems. The following line:
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
is replaced by:
image_output_sequence = tf.TensorArray(size=batch_size, dtype=tf.float32, element_shape=[max_sequence_length, 1], infer_shape=True)
TensorArray doesn't actually require a fixed shape for each element, but for my case it is fixed, so it's better to enforce it.
Then inside the body function, replace this:
ios[lc].assign(padded_cnn_features)
with:
ios = ios.write(lc, padded_cnn_output)
Then after the tf.while_loop statement, the TensorArray can be stacked to form a regular Tensor for further processing:
stacked_tensor = result.stack()
I'm trying to make an input pipeline in tensorflow for image classification, therefore I want to make batches of images and corresponding labels. The Tensorflow document suggests that we can use tf.train.batch to make batches of inputs:
train_batch, train_label_batch = tf.train.batch(
[train_image, train_image_label],
batch_size=batch_size,
num_threads=1,
capacity=10*batch_size,
enqueue_many=False,
shapes=[[224,224,3], [len(labels),]],
allow_smaller_final_batch=True
)
However, I'm thinking would it be a problem if I feed in the graph like this:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=Model(train_batch)))
The question is does the operation in the cost function dequeues images and their corresponding labels, or it returns them separately? Therefore causing the training with wrong images and labels.
There are several things you need to consider to preserve the ordering of images and labels.
let's say we need a function that gives us images and labels.
def _get_test_images(_train=False):
"""
Gets the test images and labels as a batch
Inputs:
======
_train : Boolean if images are from training set
random_crop : Boolean if random cropping is allowed
random_flip : Boolean if random horizontal flip is allowed
distortion : Boolean if distortions are allowed
Outputs:
========
images_batch : Batch of images containing BATCH_SIZE images at a time
label_batch : Batch of labels corresponding to the images in images_batch
idx : Batch of indexes of images
"""
#get images and labels
_,_img_names,_img_class,index= _get_list(_train = _train)
#total number of distinct images used for train will be equal to the images
#fed in tf.train.slice_input_producer as _img_names
img_path,label,idx = tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False)
img_path,label,idx = tf.convert_to_tensor(img_path),tf.convert_to_tensor(label),tf.convert_to_tensor(idx)
img_path = tf.cast(img_path,dtype=tf.string)
#read file
image_file = tf.read_file(img_path)
#decode jpeg/png/bmp
#tf.image.decode_image won't give shape out. So it will give error while resizing
image = tf.image.decode_jpeg(image_file)
#image preprocessing
image = tf.image.resize_images(image, [IMG_DIM,IMG_DIM])
float_image = tf.cast(image,dtype=tf.float32)
#subtracting mean and divide by standard deviation
float_image = tf.image.per_image_standardization(float_image)
#set the shape
float_image.set_shape(IMG_SIZE)
labels_original = tf.cast(label,dtype=tf.int32)
img_index = tf.cast(idx,dtype=tf.int32)
#parameters for shuffle
batch_size = BATCH_SIZE
min_fraction_of_examples_in_queue = 0.3
num_preprocess_threads = 1
num_examples_per_epoch = MAX_TEST_EXAMPLE
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
images_batch, label_batch,idx = tf.train.batch(
[float_image,label,img_index],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.summary.image('images', images_batch)
return images_batch, label_batch,idx
Here,tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False) is an interesting thing to look at where if you put shuffle=True it will shuffle all three arrays in coordination.
Second thing is, num_preprocess_threads. As long as you are using single threads for dequeue operation, batches will come out in a deterministic way. But more than one threads will shuffle the arrays randomly. for example for image 0001.jpg if True label is 1 you might get 2 or 4. Once its dequeue it is in tensor form. tf.nn.softmax_cross_entropy_with_logits shouldn't have problem with such tensors.