I want to make tensorflow's inception v3 to give out tags for an image. My goal is to convert a JPEG image to input that is accepted by inception neural network. I don't know how to process the images first so that it can run with Google Inception's v3 model. The original tensorflow project is here:
https://github.com/tensorflow/models/tree/master/inception
Originally, all the images are in a dataset and the entire dataset is first passed to input() or distorted_inputs() in ImageProcessing.py . The images in dataset are processed and passed to the train() or eval() methods (both of these work). The problem is I want a function to print out tags for one specific image (not dataset).
Below is the code for inference function that is used to generate tag with google inception. inceptionv4 function is a convolutional neural network implemented in tensorflow.
def inference(images, num_classes, for_training=False, restore_logits=True,
scope=None):
"""Build Inception v3 model architecture.
See here for reference: http://arxiv.org/abs/1512.00567
Args:
images: Images returned from inputs() or distorted_inputs().
num_classes: number of classes
for_training: If set to `True`, build the inference model for training.
Kernels that operate differently for inference during training
e.g. dropout, are appropriately configured.
restore_logits: whether or not the logits layers should be restored.
Useful for fine-tuning a model with different num_classes.
scope: optional prefix string identifying the ImageNet tower.
Returns:
Logits. 2-D float Tensor.
Auxiliary Logits. 2-D float Tensor of side-head. Used for training only.
"""
# Parameters for BatchNorm.
batch_norm_params = {
# Decay for the moving averages.
'decay': BATCHNORM_MOVING_AVERAGE_DECAY,
# epsilon to prevent 0s in variance.
'epsilon': 0.001,
}
# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.ops.conv2d, slim.ops.fc], weight_decay=0.00004):
with slim.arg_scope([slim.ops.conv2d],
stddev=0.1,
activation=tf.nn.relu,
batch_norm_params=batch_norm_params):
logits, endpoints = inception_v4(
images,
dropout_keep_prob=0.8,
num_classes=num_classes,
is_training=for_training,
scope=scope)
# Add summaries for viewing model statistics on TensorBoard.
_activation_summaries(endpoints)
# Grab the logits associated with the side head. Employed during training.
auxiliary_logits = endpoints['AuxLogits']
return logits, auxiliary_logits
This is my attempt to process the image before it is passed to inference function.
def process_image(self, image_path):
filename_queue = tf.train.string_input_producer(image_path)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
img = tf.image.decode_jpeg(value)
height = self.image_size
width = self.image_size
image_data = tf.cast(img, tf.float32)
image_data = tf.reshape(image_data, shape=[1, height, width, 3])
return image_data
I wanted to process an image file simply so that I can pass it to the inference function. And that inference prints out the tags. The above code didn't work and printed error:
ValueError: Shape () must have rank at least 1
I appreciate if anyone can provide any insight into this problem.
Inception just needs (299,299,3) images with inputs scaled between -1 and 1. See code below. I just change the images using this and put them in a TFRecord ( and then queue ) to run my stuff.
from PIL import Image
import PIL
import numpy as np
def load_image( self, image_path ):
img = Image.open( image_path )
newImg = img.resize((299,299), PIL.Image.BILINEAR).convert("RGB")
data = np.array( newImg.getdata() )
return 2*( data.reshape( (newImg.size[0], newImg.size[1], 3) ).astype( np.float32 )/255 ) - 1
Related
I have a tensorflow keras model trained with tensorflow 2.3. The model takes as input an image, however the model was trained with scaled inputs and therefore we have to scale the image by 255 before inputting them into the model.
As we use this model across a variety of platforms, I am trying to simplify this by modifying the model to simply insert a rescale layer at the start of the keras model (i.e. immediately after the input). Therefore any future consumption of this model can simply pass an image without having to scale them.
I am having a lot of trouble getting this to work. I understand I need to use the following function to create a rescaling layer;
tf.keras.layers.experimental.preprocessing.Rescaling(255, 0.0, "rescaling")
But I am unsure how to insert this to the start of the model.
Thank you in advance
you can insert this layer at the top of your trained model. below an example where first we train a model manual scaling the input and the we using the same trained model but adding at the top a Rescaling layer
from tensorflow.keras.layers.experimental.preprocessing import Rescaling
# generate dummy data
input_dim = (28,28,3)
n_sample = 10
X = np.random.randint(0,255, (n_sample,)+input_dim)
y = np.random.uniform(0,1, (n_sample,))
# create base model
inp = Input(input_dim)
x = Conv2D(8, (3,3))(inp)
x = Flatten()(x)
out = Dense(1)(x)
# fit base model with manual scaling
model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X/255, y, epochs=3)
# create new model with pretrained weight + rescaling at the top
inp = Input(input_dim)
scaled_input = Rescaling(1/255, 0.0, "rescaling")(inp)
out = model(scaled_input)
scaled_model = Model(inp, out)
# compare prediction with manual scaling vs layer scaling
pred = model.predict(X/255)
pred_scaled = scaled_model.predict(X)
(pred.round(5) == pred_scaled.round(5)).all() # True
Rescaling the images is part of data preprocessing, also rescaling images is called image normalization, this process is useful for providing a uniform scale for the dataset or numerical values you are using before building your model.In keras you can do this in many ways using one of the following according to your target:
If you are training using an Artificial neural network model you can use:-
"Batch normalization layer" or "Layer Normalization" or by the rescale method of keras you mentioned. You can look at this resource for more information about normalization .
https://machinelearningknowledge.ai/keras-normalization-layers-explained-for-beginners-batch-normalization-vs-layer-normalization/
to use the rescale method you mentioned:
#importing you libraries 1st
import tensorflow as tf
from tensorflow.keras.layers import BatchNormalization
#if your are using dataset from directory
import pathlib
then import your Dataset:
Dataset_Dir = '/Dataset/ path'
image size = (256,256) #the image size in your dataset
image shape = (96,96,3) #The shape you wish for your images in your network
Then divide your dataset to train-test I use 70-30 percent
Training_set = tf.keras.preprocessing.image_dataset_from_directory(Dataset_Dir,batch_size= 32,
image_size= image_size,
validation_split= 0.3,subset = "training",seed =123)
Test set
Testing_set = tf.keras.preprocessing.image_dataset_from_directory(Dataset_Dir,image_size= image_size,
validation_split=0.3,seed=123,subset ="validation")
normalization layer:
normalization_layer = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)
normalized_training_set = Training_set.map(lambda x, y: (normalization_layer(x), y))
training_image_batch,training_labels_batch = next(iter(normalized_training_set))
for more about this method too:
look at tensorflow tutorial:
https://www.tensorflow.org/tutorials/images/classification
I am using an exported classification model from Google AutoML Vision, hence I only have a saved_model.pb and no variables, checkpoints etc.
I want to load this model graph into a local TensorFlow installation, use it for inference and continue training with more pictures.
Main questions:
Is this plan possible, i.e. to use a single saved_model.pb without variables, checkpoints etc. and train the resulting graph with new data?
If yes: How do you get to an input shape of (?,) with images encoded as strings?
Ideally, looking ahead: Any important thing to consider for the training part?
Background infos about code:
To read the image, I use the same approach as you would when using the Docker container for inference, hence base64 encoded image.
To load the graph, I checked what tag set the graph needs via CLI (saved_model_cli show --dir input/model) which is serve.
To get input tensor names I use graph.get_operations(), which gives me Placeholder:0 for image_bytes and Placeholder:1_0 for the key (just an arbitrary string identify the image). Both have Dimension dim -1
import tensorflow as tf
import numpy as np
import base64
path_img = "input/testimage.jpg"
path_mdl = "input/model"
# input to network expected to be base64 encoded image
with io.open(path_img, 'rb') as image_file:
encoded_image = base64.b64encode(image_file.read()).decode('utf-8')
# reshaping to (1,) as the expecte dimension is (?,)
feed_dict_option1 = {
"Placeholder:0": { np.array(str(encoded_image)).reshape(1,) },
"Placeholder_1:0" : "image_key"
}
# reshaping to (1,1) as the expecte dimension is (?,)
feed_dict_option2 = {
"Placeholder:0": np.array(str(encoded_image)).reshape(1,1),
"Placeholder_1:0" : "image_key"
}
with tf.Session(graph=tf.Graph()) as sess:
tf.saved_model.loader.load(sess, ["serve"], path_mdl)
graph = tf.get_default_graph()
sess.run('scores:0',
feed_dict=feed_dict_option1)
sess.run('scores:0',
feed_dict=feed_dict_option2)
Output:
# for input reshaped to (1,)
ValueError: Cannot feed value of shape (1,) for Tensor 'Placeholder:0', which has shape '(?,)'
# for input reshaped to (1,1)
ValueError: Cannot feed value of shape (1, 1) for Tensor 'Placeholder:0', which has shape '(?,)'
How do you get to an input shape of (?,)?
Thanks a lot.
Yes! It is possible, I have an object detection model that should be similar, I can run it as follows in tensorflow 1.14.0:
import cv2
cv2.imread(filepath)
flag, bts = cv.imencode('.jpg', img)
inp = [bts[:,0].tobytes()]
out = sess.run([sess.graph.get_tensor_by_name('num_detections:0'),
sess.graph.get_tensor_by_name('detection_scores:0'),
sess.graph.get_tensor_by_name('detection_boxes:0'),
sess.graph.get_tensor_by_name('detection_classes:0')],
feed_dict={'encoded_image_string_tensor:0': inp})
I used netron to find my input.
In tensorflow 2.0 it is even easier:
import cv2
cv2.imread(filepath)
flag, bts = cv.imencode('.jpg', img)
inp = [bts[:,0].tobytes()]
saved_model_dir = '.'
loaded = tf.saved_model.load(export_dir=saved_model_dir)
infer = loaded.signatures["serving_default"]
out = infer(key=tf.constant('something_unique'), image_bytes=tf.constant(inp))
Also saved_model.pb is not a frozen_inference_graph.pb, see: What is difference frozen_inference_graph.pb and saved_model.pb?
I have created a Deep Convolution Neural Network to classify individual pixels in an image. My training data will always be the same size (32x32x7), but my testing data can be any size.
Github Repository
Currently, my model will only work on images that are the same size. I have used the tensorflow mnist tutorial extensively to help me construct my model. In this tutorial, we only use 28x28 images. How would the following mnist model be changed to accept images of any size?
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
x_image = tf.reshape(x, [-1, 28, 28, 1])
To make things a little bit more complicated, my model has transpose convolutions where the output shape needs to be specified. How would I adjust the following line of code so that the transpose convolution will output a shape that is the same size of the input.
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
Unfortunately there's no way to build dynamic graphs in Tensorflow (You could try with fold but that's outside the scope of the question). This leaves you with two options:
Bucketing: You create multiple input tensors in a few hand picked sizes and then in runtime you choose the right bucket (see example). Either way you'll probably need the second option. Seq2seq with bucketing
Resize the input and output images.
Assuming the images all maintain the same aspect ration you can try resizing the image before inference. Not sure why you care about the output since MNIST is a classification task.
Either way you can use the same approach:
from PIL import Image
basewidth = 28 # MNIST image width
img = Image.open('your_input_img.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)
# Save image or feed directly to tensorflow
img.save('feed_to_tf.jpg')
The mnist model code which you mentioned is an example using FC networks and not for convolution networks. The input shape of [None,784] is given for mnist size (28 x 28). The example is a FC network which has fixed input size.
What you are asking for is not possible in FC networks because the number of weights and biases are dependent on the input shape. This is possible if you are using a Fully convolution architecture. So my suggestion is to use a fully convolution architecture so that the weights and biases are not dependent on the input shape
Adding to #gidim's answer, here is how you can resize the images in Tensorflow, and feed the results directly to your inference. Note: This method scales and distorts the image, which might increase your loss.
All credit goes to Prasad Pai's article on Data Augmentation.
import tensorflow as tf
import numpy as np
from PIL import Image
IMAGE_SIZE = 32
CHANNELS = 1
def tf_resize_images(X_img_file_paths):
X_data = []
tf.reset_default_graph()
X = tf.placeholder(tf.float32, (None, None, CHANNELS))
tf_img = tf.image.resize_images(X, (IMAGE_SIZE, IMAGE_SIZE),
tf.image.ResizeMethod.NEAREST_NEIGHBOR)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Each image is resized individually as different image may be of different size.
for index, file_path in enumerate(X_img_file_paths):
img = Image.open(file_path)
resized_img = sess.run(tf_img, feed_dict = {X: img})
X_data.append(resized_img)
X_data = np.array(X_data, dtype = np.float32) # Convert to numpy
return X_data
I think it would be immensely helpful to the Tensorflow community if there was a well-documented solution to the crucial task of testing a single new image against the model created by the convnet in the CIFAR-10 tutorial.
I may be wrong, but this critical step that makes the trained model usable in practice seems to be lacking. There is a "missing link" in that tutorial—a script that would directly load a single image (as array or binary), compare it against the trained model, and return a classification.
Prior answers give partial solutions that explain the overall approach, but none of which I've been able to implement successfully. Other bits and pieces can be found here and there, but unfortunately haven't added up to a working solution. Kindly consider the research I've done, before tagging this as duplicate or already answered.
Tensorflow: how to save/restore a model?
Restoring TensorFlow model
Unable to restore models in tensorflow v0.8
https://gist.github.com/nikitakit/6ef3b72be67b86cb7868
The most popular answer is the first, in which #RyanSepassi and #YaroslavBulatov describe the problem and an approach: one needs to "manually construct a graph with identical node names, and use Saver to load the weights into it". Although both answers are helpful, it is not apparent how one would go about plugging this into the CIFAR-10 project.
A fully functional solution would be highly desirable so we could port it to other single image classification problems. There are several questions on SO in this regard that ask for this, but still no full answer (for example Load checkpoint and evaluate single image with tensorflow DNN).
I hope we can converge on a working script that everyone could use.
The below script is not yet functional, and I'd be happy to hear from you on how this can be improved to provide a solution for single-image classification using the CIFAR-10 TF tutorial trained model.
Assume all variables, file names etc. are untouched from the original tutorial.
New file: cifar10_eval_single.py
import cv2
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('eval_dir', './input/eval',
"""Directory where to write event logs.""")
tf.app.flags.DEFINE_string('checkpoint_dir', './input/train',
"""Directory where to read model checkpoints.""")
def get_single_img():
file_path = './input/data/single/test_image.tif'
pixels = cv2.imread(file_path, 0)
return pixels
def eval_single_img():
# below code adapted from #RyanSepassi, however not functional
# among other errors, saver throws an error that there are no
# variables to save
with tf.Graph().as_default():
# Get image.
image = get_single_img()
# Build a Graph.
# TODO
# Create dummy variables.
x = tf.placeholder(tf.float32)
w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print('Checkpoint found')
else:
print('No checkpoint found')
# Run the model to get predictions
predictions = sess.run(y_hat, feed_dict={x: image})
print(predictions)
def main(argv=None):
if tf.gfile.Exists(FLAGS.eval_dir):
tf.gfile.DeleteRecursively(FLAGS.eval_dir)
tf.gfile.MakeDirs(FLAGS.eval_dir)
eval_single_img()
if __name__ == '__main__':
tf.app.run()
There are two methods to feed a single new image to the cifar10 model. The first method is a cleaner approach but requires modification in the main file, hence will require retraining. The second method is applicable when a user does not want to modify the model files and instead wants to use the existing check-point/meta-graph files.
The code for the first approach is as follows:
import tensorflow as tf
import numpy as np
import cv2
sess = tf.Session('', tf.Graph())
with sess.graph.as_default():
# Read meta graph and checkpoint to restore tf session
saver = tf.train.import_meta_graph("/tmp/cifar10_train/model.ckpt-200.meta")
saver.restore(sess, "/tmp/cifar10_train/model.ckpt-200")
# Read a single image from a file.
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
# Start the queue runners. If they are not started the program will hang
# see e.g. https://www.tensorflow.org/programmers_guide/reading_data
coord = tf.train.Coordinator()
threads = []
for qr in sess.graph.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
start=True))
# In the graph created above, feed "is_training" and "imgs" placeholders.
# Feeding them will disconnect the path from queue runners to the graph
# and enable a path from the placeholder instead. The "img" placeholder will be
# fed with the image that was read above.
logits = sess.run('softmax_linear/softmax_linear:0',
feed_dict={'is_training:0': False, 'imgs:0': img})
#Print classifiction results.
print(logits)
The script requires that a user creates two placeholders and a conditional execution statement for it to work.
The placeholders and conditional execution statement are added in cifar10_train.py as shown below:
def train():
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
global_step = tf.contrib.framework.get_or_create_global_step()
with tf.device('/cpu:0'):
images, labels = cifar10.distorted_inputs()
is_training = tf.placeholder(dtype=bool,shape=(),name='is_training')
imgs = tf.placeholder(tf.float32, (1, 32, 32, 3), name='imgs')
images = tf.cond(is_training, lambda:images, lambda:imgs)
logits = cifar10.inference(images)
The inputs in cifar10 model are connected to queue runner object which is a multistage queue that can prefetch data from files in parallel. See a nice animation of queue runner here
While queue runners are efficient in prefetching large dataset for training, they are an overkill for inference/testing where only a single file is needed to be classified, also they are a bit more involved to modify/maintain.
For that reason, I have added a placeholder "is_training", which is set to False while training as shown below:
import numpy as np
tmp_img = np.ndarray(shape=(1,32,32,3), dtype=float)
with tf.train.MonitoredTrainingSession(
checkpoint_dir=FLAGS.train_dir,
hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),
tf.train.NanTensorHook(loss),
_LoggerHook()],
config=tf.ConfigProto(
log_device_placement=FLAGS.log_device_placement)) as mon_sess:
while not mon_sess.should_stop():
mon_sess.run(train_op, feed_dict={is_training: True, imgs: tmp_img})
Another placeholder "imgs" holds a tensor of shape (1,32,32,3) for the image that will be fed during inference -- the first dimension is the batch size which is one in this case. I have modified cifar model to accept 32x32 images instead of 24x24 as the original cifar10 images are 32x32.
Finally, the conditional statement feeds the placeholder or queue runner output to the graph. The "is_training" placeholder is set to False during inference and "img" placeholder is fed a numpy array -- the numpy array is reshaped from 3 to 4 dimensional vector to conform to the input tensor to inference function in the model.
That is all there is to it. Any model can be inferred with a single/user defined test data like shown in the script above. Essentially read the graph, feed data to the graph nodes and run the graph to get the final output.
Now the second method. The other approach is to hack cifar10.py and cifar10_eval.py to change batch size to one and replace the data coming from the queue runner with the one read from a file.
Set batch size to 1:
tf.app.flags.DEFINE_integer('batch_size', 1,
"""Number of images to process in a batch.""")
Call inference with an image file read.
def evaluate(): with tf.Graph().as_default() as g:
# Get images and labels for CIFAR-10.
eval_data = FLAGS.eval_data == 'test'
images, labels = cifar10.inputs(eval_data=eval_data)
import cv2
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
img = tf.cast(img, tf.float32)
logits = cifar10.inference(img)
Then pass logits to eval_once and modify eval once to evaluate logits:
def eval_once(saver, summary_writer, top_k_op, logits, summary_op):
...
while step < num_iter and not coord.should_stop():
predictions = sess.run([top_k_op])
print(sess.run(logits))
There is no separate script to run this method of inference, just run cifar10_eval.py which will now read a file from the user defined location with a batch size of one.
Here's how I ran a single image at a time. I'll admit it seems a bit hacky with the reuse of getting the scope.
This is a helper function
def restore_vars(saver, sess, chkpt_dir):
""" Restore saved net, global score and step, and epsilons OR
create checkpoint directory for later storage. """
sess.run(tf.initialize_all_variables())
checkpoint_dir = chkpt_dir
if not os.path.exists(checkpoint_dir):
try:
os.makedirs(checkpoint_dir)
except OSError:
pass
path = tf.train.get_checkpoint_state(checkpoint_dir)
#print("path1 = ",path)
#path = tf.train.latest_checkpoint(checkpoint_dir)
print(checkpoint_dir,"path = ",path)
if path is None:
return False
else:
saver.restore(sess, path.model_checkpoint_path)
return True
Here is the main part of the code that runs a single image at a time within the for loop.
to_restore = True
with tf.Session() as sess:
for i in test_img_idx_set:
# Gets the image
images = get_image(i)
images = np.asarray(images,dtype=np.float32)
images = tf.convert_to_tensor(images/255.0)
# resize image to whatever you're model takes in
images = tf.image.resize_images(images,256,256)
images = tf.reshape(images,(1,256,256,3))
images = tf.cast(images, tf.float32)
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
with tf.variable_scope(tf.get_variable_scope()) as scope:
if to_restore:
logits = inference(images)
else:
scope.reuse_variables()
logits = inference(images)
if to_restore:
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
to_restore = False
logit_val = sess.run(logits)
print(logit_val)
Here is an alternative implementation to the above using place holders it's a bit cleaner in my opinion. but I'll leave the above example for historical reasons.
imgs_place = tf.placeholder(tf.float32, shape=[my_img_shape_put_here])
images = tf.reshape(imgs_place,(1,256,256,3))
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
logits = inference(images)
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
with tf.Session() as sess:
for i in test_img_idx_set:
logit_val = sess.run(logits,feed_dict={imgs_place=i})
print(logit_val)
got it working with this
softmax = gn.inference(image)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
with tf.Session() as sess:
saver.restore(sess, ckpt.model_checkpoint_path)
softmaxval = sess.run(softmax)
print(softmaxval)
output
[[ 6.73550041e-03 4.44930716e-04 9.92570221e-01 1.00681427e-06
3.05406687e-08 2.38927707e-04 1.89839399e-12 9.36238484e-06
1.51646684e-09 3.38977535e-09]]
I don't have working code for you I'm afraid, but here's how we often tackle this problem in production:
Save out the GraphDef to disk, using something like write_graph.
Use freeze_graph to load the GraphDef and checkpoints, and save out a GraphDef with the Variables converted into Constants.
Load the GraphDef in something like label_image or classify_image.
For your example this is overkill, but I would at least suggest serializing the graph in the original example as a GraphDef, and then loading it in your script (so you don't have to duplicate the code generating the graph). With the same graph created, you should be able to populate it from a SaverDef, and the freeze_graph script may help as an example.
Based on the Tensorflow tutorial for ConvNet, some points are not readily apparent to me:
are the images being distorted actually added to the pool of original images?
or are the distorted images used instead of the originals?
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
The flow of functions for the tutorial seems to be as follows:
cifar_10_train.py
def train
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
[...]
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
[...]
cifar10.py
def distorted_inputs():
"""Construct distorted input for CIFAR training using the Reader ops.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
Raises:
ValueError: If no data_dir
"""
if not FLAGS.data_dir:
raise ValueError('Please supply a data_dir')
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
return cifar10_input.distorted_inputs(data_dir=data_dir,
batch_size=FLAGS.batch_size)
and finally cifar10_input.py
def distorted_inputs(data_dir, batch_size):
"""Construct distorted input for CIFAR training using the Reader ops.
Args:
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in xrange(1, 6)]
for f in filenames:
if not tf.gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
# Randomly flip the image horizontally.
distorted_image = tf.image.random_flip_left_right(distorted_image)
# Because these operations are not commutative, consider randomizing
# the order their operation.
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(distorted_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
min_fraction_of_examples_in_queue)
print('Filling queue with %d CIFAR images before starting to train.'
'This will take a few minutes.' % min_queue_examples)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=True)
are the images being distorted actually added to the pool of original images?
It depends on the definition of the pool. In tensorflow, you have ops which are basic objects in your network graph. here, data production is an op itself. Thus you do not have a finite set of training samples, instead you have a potentialy infinite set of samples generated from the training set.
or are the distorted images used instead of the originals?
As you can see from the source you included - sample is taken from the training batch, then it is randomly transformed, thus there is very small probability of using unaltered image (especially that cropping is used, which always modifies).
how many distorted images are being produced? (i.e. what augmentation factor was defined?)
There is no such thing, this is never ending process. Think about this in terms of random access to possibly infinite source of data, as this is what is efficiently happening here. Every single batch can be different from the previous one.