How is data augmentation implemented in Tensorflow?

How is data augmentation implemented in Tensorflow? - python

Based on the Tensorflow tutorial for ConvNet, some points are not readily apparent to me:
are the images being distorted actually added to the pool of original images?
or are the distorted images used instead of the originals?
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
The flow of functions for the tutorial seems to be as follows:
cifar_10_train.py
def train
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
[...]
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
[...]
cifar10.py
def distorted_inputs():
"""Construct distorted input for CIFAR training using the Reader ops.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
Raises:
ValueError: If no data_dir
"""
if not FLAGS.data_dir:
raise ValueError('Please supply a data_dir')
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
return cifar10_input.distorted_inputs(data_dir=data_dir,
batch_size=FLAGS.batch_size)
and finally cifar10_input.py
def distorted_inputs(data_dir, batch_size):
"""Construct distorted input for CIFAR training using the Reader ops.
Args:
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in xrange(1, 6)]
for f in filenames:
if not tf.gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
# Randomly flip the image horizontally.
distorted_image = tf.image.random_flip_left_right(distorted_image)
# Because these operations are not commutative, consider randomizing
# the order their operation.
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(distorted_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
min_fraction_of_examples_in_queue)
print('Filling queue with %d CIFAR images before starting to train.'
'This will take a few minutes.' % min_queue_examples)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=True)

are the images being distorted actually added to the pool of original images?
It depends on the definition of the pool. In tensorflow, you have ops which are basic objects in your network graph. here, data production is an op itself. Thus you do not have a finite set of training samples, instead you have a potentialy infinite set of samples generated from the training set.
or are the distorted images used instead of the originals?
As you can see from the source you included - sample is taken from the training batch, then it is randomly transformed, thus there is very small probability of using unaltered image (especially that cropping is used, which always modifies).
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
There is no such thing, this is never ending process. Think about this in terms of random access to possibly infinite source of data, as this is what is efficiently happening here. Every single batch can be different from the previous one.

Related

how to load label data presented in raster format into Keras/Tensorflow

I want to use CNN network to segment 2 objects (binary: "0: object not present, 1: object present") into shapes but I have an issue with data. The train data is 150 images and in "jpg" format and the ground truth (label data) is also 150 images of "png" rasters of 0 and 1 (resulting in black white images).
Now the question is how to load this hybrid of train images and label images in Keras/Tensorflow and if there`s a dummy example and/or demonstration on how to do that in Python, I would be grateful.

You can define one generator for reading the input images and another one for reading the labels using the ImageDataGenerator class and its flow_from_directory() method, and then combine these two generators in a single generator. Just make sure the directory structure and (order of) file names of input and label images are the same:
data_image_gen = ImageDataGenerator(...)
data_label_gen = ImageDataGenerator(...)
image_gen = data_image_gen.flow_from_directory(image_directory,
# no need to return labels
class_mode=None,
# don't shuffle to have the same order as labels
shuffle=False)
image_gen = data_image_gen.flow_from_directory(label_directory,
color_mode='grayscale',
# no need to return labels
class_mode=None,
# don't shuffle to have the same order as images
shuffle=False)
def final_gen(image_gen, label_gen):
for data, labels in zip(image_gen, label_gen):
# divide labels by 255 to make them like masks i.e. 0 and 1
labels /= 255.
# remove the last axis, i.e. (batch_size, n_rows, n_cols, 1) --> (batch_size, n_rows, n_cols)
labels = np.squeeze(labels, axis=-1)
yield data, labels
# ... define your model
# fit the model
model.fit_generator(final_gen(image_gen, label_gen), ...)

Tensorflow: Batching whole dataset (MNIST Tutorial)

Following this tutorial: https://www.tensorflow.org/versions/r1.3/get_started/mnist/pros
I wanted to solve a classification problem with labeled images by myself. Since I'm not using the MNIST database, I spent days creating my own dataset inside tensorflow. It looks like this:
#variables
batch_size = 50
dimension = 784
stages = 10
#step 1 read Dataset
filenames = tf.constant(filenamesList)
labels = tf.constant(labelsList)
#step 2 create Dataset
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
#step 3: parse every image in the dataset using `map`
def _parse_function(filename, label):
#convert label to one-hot encoding
one_hot = tf.one_hot(label, stages)
#read image file
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
return image, one_hot
#step 4 final input tensor
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size) #batch_size = 100
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
for _ in range(10):
dataset = dataset.shuffle(buffer_size = 100)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
train_step.run(feed_dict={x: images, y_:labels})
Somehow using a higher batch_sizes will break python. What I'm trying to do is to train my neural network with new batches on each iteration. That's why Im also using dataset.shuffle(...). Using dataset.shuffle also breaks my Python.
What I wanted to do (because shuffle breaks) is to batch the whole dataset. By evaluating ('.eval()') I will get a numpy array. I will then shuffle the array with numpy.random.shuffle(images) and then pick up some the first elements to train it.
e.g.
for _ in range(1000):
images = tf.reshape(images, [batch_size,dimension]).eval()
labels = tf.reshape(labels, [batch_size,stages]).eval()
#shuffle
np.random.shuffle(images)
np.random.shuffle(labels)
train_step.run(feed_dict={x: images[0:train_size], y_:labels[0:train_size]})
But then here comes the problem that I can't batch the my whole dataset. It looks like that the data is too big for python to work with.
How should I solve this differently?
Since I'm not using the MNIST database there isn't a function like mnist.train.next_batch(100) which comes handy for me.

Notice how you call shuffle and batch inside your for loop? This is wrong. Datasets in TF work in the style of functional programming, so you are actually defining a pipeline for preprocessing the data to feed into your model. In a way, you give a recipe that answers the question "given this raw data, which operations (map, etc.) should I do to get batches that I can feed into my neural network?"
Now you are modifying that pipeline for every batch! What happens is that the first iteration, the batch size is, say [32 3600]. The next iteration, the elements of this shape are batched again, to [32 32 3600], and so on.
There's a great tutorial on the TF website where you can find out more how Datasets work, but here are a few suggestions how you can resolve your problem.
Move the shuffling to right after "Step 2" in your code. Then you are shuffling the whole dataset so your batches will have a good mixture of examples. Also increase the buffer_size argument, this works in a different way than you probably assume. It's usually a good idea to shuffle as early as possible, as it can be a slow operation if you have a large dataset -- the shuffled part of dataset will have to be read into memory. Here it does not really matter whether you shuffle the filenames and labels, or the read images and labels -- but the latter will have more work to do since the dataset is larger by that time.
Move batching and the iterator generator to be the last steps, just before starting your training loop.
Don't use feed_dict with Dataset iterators to input data into your model. Instead, define your model in terms of the outputs of iterator.get_next() and omit the feed_dict argument. See more details from this Q&A: Tensorflow: create minibatch from numpy array > 2 GB

Ive been getting through a lot of problems with creating tensorflow datasets. So I decided to use OpenCV to import images.
import opencv as cv
imgDataset = []
for i in range(len(files)):
imgDataset.append(cv2.imread(files[i]))
imgDataset = np.asarray(imgDataset)
the shape of imgDataset is (num_img, height, width, col_channels). Getting the i-th image should be imgDataset[i].
shuffling the dataset and getting only batches of it can be done like this:
from sklearn.utils import shuffle
X,y = shuffle(X, y)
X_feed = X[batch_size]
y_feed = y[batch_size]
Then you feed X_feed and y_feed into your model

tensorflow input pipeline returns multiple values

I'm trying to make an input pipeline in tensorflow for image classification, therefore I want to make batches of images and corresponding labels. The Tensorflow document suggests that we can use tf.train.batch to make batches of inputs:
train_batch, train_label_batch = tf.train.batch(
[train_image, train_image_label],
batch_size=batch_size,
num_threads=1,
capacity=10*batch_size,
enqueue_many=False,
shapes=[[224,224,3], [len(labels),]],
allow_smaller_final_batch=True
)
However, I'm thinking would it be a problem if I feed in the graph like this:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=Model(train_batch)))
The question is does the operation in the cost function dequeues images and their corresponding labels, or it returns them separately? Therefore causing the training with wrong images and labels.

There are several things you need to consider to preserve the ordering of images and labels.
let's say we need a function that gives us images and labels.
def _get_test_images(_train=False):
"""
Gets the test images and labels as a batch
Inputs:
======
_train : Boolean if images are from training set
random_crop : Boolean if random cropping is allowed
random_flip : Boolean if random horizontal flip is allowed
distortion : Boolean if distortions are allowed
Outputs:
========
images_batch : Batch of images containing BATCH_SIZE images at a time
label_batch : Batch of labels corresponding to the images in images_batch
idx : Batch of indexes of images
"""
#get images and labels
_,_img_names,_img_class,index= _get_list(_train = _train)
#total number of distinct images used for train will be equal to the images
#fed in tf.train.slice_input_producer as _img_names
img_path,label,idx = tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False)
img_path,label,idx = tf.convert_to_tensor(img_path),tf.convert_to_tensor(label),tf.convert_to_tensor(idx)
img_path = tf.cast(img_path,dtype=tf.string)
#read file
image_file = tf.read_file(img_path)
#decode jpeg/png/bmp
#tf.image.decode_image won't give shape out. So it will give error while resizing
image = tf.image.decode_jpeg(image_file)
#image preprocessing
image = tf.image.resize_images(image, [IMG_DIM,IMG_DIM])
float_image = tf.cast(image,dtype=tf.float32)
#subtracting mean and divide by standard deviation
float_image = tf.image.per_image_standardization(float_image)
#set the shape
float_image.set_shape(IMG_SIZE)
labels_original = tf.cast(label,dtype=tf.int32)
img_index = tf.cast(idx,dtype=tf.int32)
#parameters for shuffle
batch_size = BATCH_SIZE
min_fraction_of_examples_in_queue = 0.3
num_preprocess_threads = 1
num_examples_per_epoch = MAX_TEST_EXAMPLE
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
images_batch, label_batch,idx = tf.train.batch(
[float_image,label,img_index],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.summary.image('images', images_batch)
return images_batch, label_batch,idx
Here,tf.train.slice_input_producer([_img_names,_img_class,index],shuffle=False) is an interesting thing to look at where if you put shuffle=True it will shuffle all three arrays in coordination.
Second thing is, num_preprocess_threads. As long as you are using single threads for dequeue operation, batches will come out in a deterministic way. But more than one threads will shuffle the arrays randomly. for example for image 0001.jpg if True label is 1 you might get 2 or 4. Once its dequeue it is in tensor form. tf.nn.softmax_cross_entropy_with_logits shouldn't have problem with such tensors.

Inception: How to process image to use with Inception

I want to make tensorflow's inception v3 to give out tags for an image. My goal is to convert a JPEG image to input that is accepted by inception neural network. I don't know how to process the images first so that it can run with Google Inception's v3 model. The original tensorflow project is here:
https://github.com/tensorflow/models/tree/master/inception
Originally, all the images are in a dataset and the entire dataset is first passed to input() or distorted_inputs() in ImageProcessing.py . The images in dataset are processed and passed to the train() or eval() methods (both of these work). The problem is I want a function to print out tags for one specific image (not dataset).
Below is the code for inference function that is used to generate tag with google inception. inceptionv4 function is a convolutional neural network implemented in tensorflow.
def inference(images, num_classes, for_training=False, restore_logits=True,
scope=None):
"""Build Inception v3 model architecture.
See here for reference: http://arxiv.org/abs/1512.00567
Args:
images: Images returned from inputs() or distorted_inputs().
num_classes: number of classes
for_training: If set to `True`, build the inference model for training.
Kernels that operate differently for inference during training
e.g. dropout, are appropriately configured.
restore_logits: whether or not the logits layers should be restored.
Useful for fine-tuning a model with different num_classes.
scope: optional prefix string identifying the ImageNet tower.
Returns:
Logits. 2-D float Tensor.
Auxiliary Logits. 2-D float Tensor of side-head. Used for training only.
"""
# Parameters for BatchNorm.
batch_norm_params = {
# Decay for the moving averages.
'decay': BATCHNORM_MOVING_AVERAGE_DECAY,
# epsilon to prevent 0s in variance.
'epsilon': 0.001,
}
# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.ops.conv2d, slim.ops.fc], weight_decay=0.00004):
with slim.arg_scope([slim.ops.conv2d],
stddev=0.1,
activation=tf.nn.relu,
batch_norm_params=batch_norm_params):
logits, endpoints = inception_v4(
images,
dropout_keep_prob=0.8,
num_classes=num_classes,
is_training=for_training,
scope=scope)
# Add summaries for viewing model statistics on TensorBoard.
_activation_summaries(endpoints)
# Grab the logits associated with the side head. Employed during training.
auxiliary_logits = endpoints['AuxLogits']
return logits, auxiliary_logits
This is my attempt to process the image before it is passed to inference function.
def process_image(self, image_path):
filename_queue = tf.train.string_input_producer(image_path)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
img = tf.image.decode_jpeg(value)
height = self.image_size
width = self.image_size
image_data = tf.cast(img, tf.float32)
image_data = tf.reshape(image_data, shape=[1, height, width, 3])
return image_data
I wanted to process an image file simply so that I can pass it to the inference function. And that inference prints out the tags. The above code didn't work and printed error:
ValueError: Shape () must have rank at least 1
I appreciate if anyone can provide any insight into this problem.

Inception just needs (299,299,3) images with inputs scaled between -1 and 1. See code below. I just change the images using this and put them in a TFRecord ( and then queue ) to run my stuff.
from PIL import Image
import PIL
import numpy as np
def load_image( self, image_path ):
img = Image.open( image_path )
newImg = img.resize((299,299), PIL.Image.BILINEAR).convert("RGB")
data = np.array( newImg.getdata() )
return 2*( data.reshape( (newImg.size[0], newImg.size[1], 3) ).astype( np.float32 )/255 ) - 1

Caffe HDF5 not learning

I'm fine-tuning the GoogleNet network with Caffe to my own dataset. If I use IMAGE_DATA layers as input learning takes place. However, I need to switch to an HDF5 layer for further extensions that I require. When I use HDF5 layers no learning takes place.
I am using the exact same input images, and the labels match also. I have also checked to ensure that the data in .h5 files can be loaded correctly. It does, and Caffe is also able to find the number of examples I feed it as well as the correct number of classes (2).
This leads me to think that the issue lies in the transformations I am performing manually (since HDF5 layers do not perform any built-in transformations). The code for these is below. I do the following:
Convert image from RGB to BGR
Resize it to 256x256 so I can subtract the mean file from ImageNet (included in the Caffe library)
Since the original GoogleNet prototxt does not divide by 255, I also do not (see here)
I resize the image down to 224x224, which is the crop size required by GoogleNet
I transpose the image as needed to satisfy CxHxW, as required by Caffe
At the moment I am not performing data augmentation, which could be turned on if I let oversample=True.
Can anyone see anything wrong with this approach? Is data augmentation so critical that no learning would take place without it?
The HDF5 conversion code
IMG_RESHAPE = 224
IMG_UNCROPPED = 256
def resize_convert(img_names, path=None, oversample=False):
'''
Load images, set to BGR mode and transpose to CxHxW
and subtract the Imagenet mean. If oversample is True,
perform data augmentation.
Parameters:
---------
img_names (list): list of image names to be processed.
path (string): path to images.
oversample (bool): if True then data augmentation is performed
on each image, and 10 crops of size 224x224 are produced
from each image. If False, then a single 224x224 is produced.
'''
path = path if path is not None else ''
if oversample == False:
all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
else:
all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')
#load the imagenet mean
mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')
for i, img_name in enumerate(img_names):
img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC
#subtract the mean of Imagenet
#First, resize to 256 so we can subtract the mean of dims 256x256
img = img[...,::-1] #Convert RGB TO BGR
img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
img = np.transpose(img, (2, 0, 1)) #HxWxC => CxHxW
#Since mean is given in Caffe channel order: 3xWxH
#Assume it also is given in BGR order
img = img - mean_val
#set to 0-1 range => I don't think googleNet requires this
#I tried both and it didn't make a difference
#img = img/255
#resize images down since GoogleNet accepts 224x224 crops
if oversample == False:
img = np.transpose(img, (1,2,0)) # CxHxW => HxWxC
img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe
all_imgs[i, :, :, :] = img
#oversampling requires HxWxC order
if oversample:
all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe
return all_imgs
Relevant differences between IMAGE_DATA and HDF5 prototxt files
name: "GoogleNet"
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/train_list.txt"
batch_size: 32
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/valid_list.txt"
batch_size:10
}
include: { phase: TEST }
}
Update
When I say no learning is taking place I mean that my training loss is not going down consistently when using HDF5 data compared to the IMG_Data. In the images below, the first plot is plot the change in the training loss for the IMG_DATA network, and the other is the HDF5 data network.
One possibility that I am considering is that the network is overfitting to each of the .h5 that I am feeding it. At the moment I am using data augmentation, but all of the augmented examples are stored into a single .h5 file, along with other examples. However, because all of the augmented versions of a single input image are all contained within the same .h5 file, I think this could cause the network to overfit to that specific .h5 file. However, I am not sure whether this is what the second plot suggests.

I faced the same problem and found out that for some reason doing the transformation manually as you are doing in your code causes the images to be all black (all zeros). try to debug your code and see if that is happening.
the solution is to use the same methodology explained in the Caffe tutorial here
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
the part where you see
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR
then few lines down
image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.