Processing batch of differently-shaped tensors in tensorflow

Processing batch of differently-shaped tensors in tensorflow - python

Suppose we have a batch of images as 4D tensor of shape (batches, height, width, channels). And suppose we have some images transforming functions f0(img...), f1(img...) ..., each function processes just one image (not batch) represented as 3D tensor. Also output tensors from this functions may be different in shape than input tensors, even some of functions may produce more than one tensor and functions may have some extra arguments besides image.
Also suppose that we have non-eager execution mode, meaning that we build a graph to .pb file first and later graph is executed mostly on GPU for efficiency.
As it is common for TF, first batches dimension size might be unknown, which is signified by value None. Real size of this dimension is only known at graph evaluation time on input data (user may try to feed batches of varying sizes).
We're given a sequence of required image transformations specified as the list of functions with extra arguments. The task is to somehow process input batch of images through these functions. More than that it is needed to do in most efficient way, tf.py_functions are not allowed, meaning that if all transforming functions are implemented and run on GPU-side only then it is not allowed for intermediate results to go back and forth from gpu to cpu as inputs to py_functions.
The main difficulty is due to batches count being not known (equal to None). Hence we can't use python loop to processs each image through function. Of cause we can fix some maximum possible batch size and create a loop up to this maximum size, and for loop iterations when there is no input we can conditionally not process this input and pass empty tensor forth. Another problem that on each stage input and output list length may be different, e.g. when image transofmation splits input image into 3-4 unequal in shape tensors or does multi-crop with different windows.
I think the right approach would be to use tf.while_loop somehow, but I haven't understood how it works, and how to apply this while to case when each stage has different length of possible tensors of different shape.
There is tf.map_fn which is perfect for the case when all inputs have same shape and outputs too. Then we can pass down the transofmation path single 4D tensor. But this is not the case, inputs can be different in shapes and outputs too.
Maybe there is anything like python's list but in TF side? Meaning list to keep tensors of different shape, but TF-only, which doesn't leave GPU like python list does. If there exists such list and there is analogy of tf.map_fn then we can use this mapping to process list of tensors of different shape, by applying one function. It partially and mostly would solve my task, at least help me for sure.

Related

Tensorflow 2.0: Best way for structure the output of `tf.data.Dataset` in multiple inputs scenario

Im building a GAN on Tensorflow for Image Deblurring, its an implementation of DeblurGANv2. I setup the GAN in a way it have two inputs, a batch of blurred images, and a batch of sharp images. Following this lines, I design the input to be a Python Dictionary with two Keys ['sharp', 'blur'], each one have a tensor of shape [batch_size, 512, 512, 3], this make it easy for feed the blurred images batch to the generator, and then feed the output of generator and the sharp images batch to the discriminator.
Based on the last requirements, i create a tf.data.Dataset that outputs exactly that, a dict containing the two tensors, each one with their batch dimension. this complements perfectly with my GAN implementation, everything work fine and smoothly.
So keep in mind, my input is not a tensor, but a python dict, that has no batch dimension, this will be relevant for explain my problem later.
Recently, i decided to add support for distributed training using Tensorflow Distribution Strategies. This feature of Tensorflow allows to distribute the training over multiple devices, inclusively over multiple machines. There is a feature with some of the implementations, for example MirroredStrategy, that takes the input tensor, splits it in equal parts, and feed each slice to different devices, that means, if you have a batch size of 16 and 4 GPUs, each GPU will end taking a local batch of 4 datapoints, after this there is some magic for aggregate the results and other stuff that is not relevant to my problem.
As you already notice, is critical for distribution strategies to have a tensor as input, or at least some sort of input with an exterior batch dimension, and what i have is a Python dict, with the batch dimension of the inputs in the internal dictionary tensor values. This is a huge problem, my current implementation is not compatible with distributed training.
I was looking for workarounds, but i cant wrap my head very well around this, maybe just make the input a huge tensor of shape=[batch_size, 2, 512, 512, 3] and slice it? not sure this just come to my mind right now lol. Anyways i see this very ambiguous, i cant not differentiate the two inputs, at least not with the clarity of the dictionary keys. Edit: The problem with this solution is that make my dataset transformations very expensive, hence makes the dataset throughput lot slower, taking into account this is an image loading pipeline, this is a major point.
Maybe my explanation of how distributed strategies work is not the most rigorous one, if im not seeing something feel free to correct me pls.
PD: This is not a bug question or a code error, mostly a "System Design Query", hope this is not illegal here

Instead of using dictionary as input the GAN, you can try mapping a function in the following way,
def load_image(fileA,fileB):
imageA = tf.io.read_file(fileA)
imageA = tf.image.decode_jpeg(imageA, channels=3)
imageB = tf.io.read_file(fileB)
imageB = tf.image.decode_jpeg(imageB)
return imageA,imageB
trainA = glob.glob('blur/*.jpg')
trainB = glob.glob('sharp/*.jpg')
train_dataset = tf.data.Dataset.from_tensor_slices((trainA,trainB))
train_dataset = train_dataset.map(load_image).batch(batch_size)
#for mirrored strategy
dist_dataset = mirrored_strategy.experimental_distribute_dataset(train_dataset)
You can iterate the dataset and update the network by passing both the images.
I hope this helps !

Keras combine sequence of input data with static features

I am currently struggeling with the Problem of combining static features with a sequence of input data within a batch.
I have two channels of input data, one is processed via a convolutional neural network (eg. vgg-16 or comparable) and outputs a feature map.
My second input channel contains a list (with variable length) of input data.
Each single entry of that list and the calculated feature map should be fed into a classifier.
I know that I can use a TimeDistributed Wrapper to process sequences of data, but that only partially solves my problem:
The calculation of the feature map in the first input channel is costly and should only be performed once per batch
As the list in the second channel has a variable length, I cannot make use of a repeat Layer to to duplicate the feature map properly, additionally I run into memory problems as I cannot hold several hundreds (or thousand) copies of the feature map in gpu memory
What is the best way to properly combine static data (ones per batch) with a sequence of data?

How does the `my_input_fn` in the getting started with TensorFlow allow enumeration over the data?

I'm looking at first steps with tensor flow as part of the google machine learning crahs course and already confused. My understanding is (please correct me if I'm wrong):
Step 4 defines an input function my_input_fn that formats the data into the relevant TensorFlow structuresTensor
Step 5 then supplies this function into the train call.
The intention is that the train call will make successive calls to my_input_fn to get successive batches of data to adjust the model on. (??? vert suspect on this now)
my_input_fn is defined here:
def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
"""Trains a linear regression model of one feature.
Args:
features: pandas DataFrame of features
targets: pandas DataFrame of targets
batch_size: Size of batches to be passed to the model
shuffle: True or False. Whether to shuffle the data.
num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
Returns:
Tuple of (features, labels) for next data batch
"""
# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}
# Construct a dataset, and configure batching/repeating
ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)
# Shuffle the data, if specified
if shuffle:
ds = ds.shuffle(buffer_size=10000)
# Return the next batch of data
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
From my reading of my_input_fn, I don't understand how this happens. I only have a rudimentary knowledge of python but my reading of the function is that every call to it will re-initialise the tensor structures from the pandas frames, get an iterator and then return the first element of it. Every time it is called. Sure, in the case of this example, if the data is shuffled (which it is by default) and the dataset is big it's unlikely you'll get duplicates for a step of 100, but this smells of sloppy programming (i.e. in the case it isn't shuffled it would always return the same first training data set) so I doubt this is the case.
My next suspicion is that the one_shot_iterator().get_next() call is doing some interesting/wacky/tricky stuff. Like returning some sort of late eval structure that will allow the train function to enumerate to the next batch from itself as opposed to re-invoking my_input_fn?
But honestly I'd like to clarify this because at this stage more hours than I care to think about later I am not any closer to understanding.
My attempts to research have just led to further confusion.
The tutorial suggests reading this - at one point it this says "The train, evaluate, and predict methods of every Estimator require input functions to return a (features, label) pair containing tensorflow tensors.". Okay, this is inline with my original thoughts. Basically the example and label packaged in TensorFlow structures.
But then it shows the results of what it returns and it is stuff like this (example):
({
'SepalLength': <tf.Tensor 'IteratorGetNext:2' shape=(?,) dtype=float64>,
'PetalWidth': <tf.Tensor 'IteratorGetNext:1' shape=(?,) dtype=float64>,
'PetalLength': <tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=float64>,
'SepalWidth': <tf.Tensor 'IteratorGetNext:3' shape=(?,) dtype=float64>},
Tensor("IteratorGetNext_1:4", shape=(?,), dtype=int64))
In the code lab, my_input_fn(my_feature, targets) returns:
({'total_rooms': <tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=float64>},
)
I have NO IDEA what to make of this. My reading of tensors does not mention anything like this. I don't even know how to BEGIN interrogating this with my rudimentary Python and non-existent TensorFlow knowledge.
The documentation for the one shot iterator says it creates an iterator for enumerating the elements. Again, this is in line with my thinking.
The get_next documentation says:
Returns a nested structure of tf.Tensors containing the next element.
I don't know how to parse this. What sort of nested structure? I mean it looks like a tuple but why wouldn't you just say tuple? What dictates this? Where is it described? Surely it is important?
What am I misunderstanding here?
(For a course that purportedly requires no prior knowledge of TensorFlow, the google machine learning crash course is making me feel pretty moronic. I'm genuinely curious as to how others in my situation are going with this.)

The input function (in this case my_input_function) is not called repeatedly. It is called once, creates a bunch of tensorflow ops (for creating a dataset, shuffling it etc.) and finally returns the get_next op of the iterator. This op will be called repeatedly, but all it does is iterate over the dataset. The things you do in my_input_function (such as shuffling, batching, repeating) only happen once.
In general: When working with Tensorflow programs, you have to get used to the fact that they work quite differently from "normal" Python programs. Most of the code you write (especially things with tf. in front) will only be executed once to build the computation graph, and then this graph is executed many times.
EDIT: However, there is the experimental tf.eager API (supposedly becoming fully integrated in TF 1.7) that changes exactly this, i.e. things are executed as you write them (more like numpy). This should allow for faster experimentation.
To go through the input dunction step by step: You start out with a dataset that you create from "tensor slices" (e.g. numpy arrays). Then you call the batch method. This essentially creates a new dataset, the elements of which are batches of elements of the original dataset. Similarly, repeating and shuffling also creates new datasets (to be precise, they create ops that will create these datasests once they're actually executed as part of the computation graph). Finally, you return an iterator over the batched, repeated, shuffled dataset. Only this iterator's get_next op will execute repeatedly, returning new elements of the dataset until it is exhausted.
EDIT: Indeed iterator.get_next() only returns an op. The iteration is performed only once this op is run in a tf.Session.
As for the output that you have "no idea what to make of": Not sure what your question is exactly, but what you posted are just dicts mapping strings to tensors. The tensors automatically get names related to the op that produces them (iterator.get_next), and their shape is not known because batch size can be variable -- even specifying it, the last batch could be smaller if the batch size doesn't evenly divide the dataset size (e.g. dataset with 10 elements and batch size of 4 -- last batch is going to be size 2). ? elements in tensor shapes signify unknown dimensions.
EDIT: Regarding the naming: The ops receive default names. However they would all receive the same default name (IteratorGetNext) in this case, but there cannot be multiple ops with the same name. So Tensorflow automatically appends integers to make the names unique. That's all!
As for "nested structures": Input functions are often used with tf.estimator which expects a fairly simple input structure (a tuple containing a Tensor or dict of Tensors as input, and a Tensor as output if I'm not mistaken). However in general, input functions support more complex, nested output structures such as (a, (tuple, of), (tuples, (more, tuples, elements), and), words). Note that this is the structure of one output, i.e. one "step" of the iterator (e.g. a batch of data). Repeatedly calling this op will enumerate the whole dataset.
EDIT: What structure is returned by an input function is determined by just that function! E.g. a dataset from tensor slices will return tuples, where the nth element is the nth "tensor slice". There are functions such as dataset.zip that works just like the Python equivalent. If you would take a dataset with structure (e1, e2) and zip it with a dataset (e3,) you would get ((e1, e2), e3).
What format is needed depends on the application. In principle you could provide any format and then the code that receives this input could do anything with it. However, as I said, probably the most common use is in the context of tf.estimator, and there your input function is supposed to return a tuple (features, labels) where features is either a tensor or dict of tensors (as in your case) and labels is also a tensor or dict of tensors. If either is a dict, the model function is responsible for grabbing the correct values/tensors from there.
In general, I would advise you to play around with this stuff. Check out the tf.data API and of course the Programmer's Guide. Create some datasets/input functions and simply start a session and repeatedly run the iterator.get_next() op. See what comes out of there. Try all the different transformations such as zip, take, padded_batch... Seeing it in action without the need to actually do anything with this data should give you a better understanding.

TensorFlow: Why is there a need to reshape non-sparse elements once when parsing a TF-example from TFRecord files?

In the TensorFlow documentation at GitHub, there is this following code:
# Reshape non-sparse elements just once:
for k in self._keys_to_features:
v = self._keys_to_features[k]
if isinstance(v, parsing_ops.FixedLenFeature):
example[k] = array_ops.reshape(example[k], v.shape)
I am wondering why there is a need to reshape a FixedLenFeature tensor after parsing it from a TFRecord file.
In fact, what is the difference between a FixedLenFeature and VarLenFeature and what is their relevance to a Tensor? I am loading images in this case, so why would all of them be classified as a FixedLenFeature? What is an example of a VarLenFeature?

Tensors are stored on disk without shape information in an Example protocol buffer format (TFRecord files are collections of Examples). The documentation in the .proto file describes things fairly well, but the basic point is that Tensor entries are stored in row-major order with no shape information, so that must be provided when the Tensors are read. Note that the situation is similar for storing Tensors in memory: the shape information is kept separately, and just reshaping a Tensor changes only metadata (things like transpose, on the other hand, can be expensive).
VarLenFeatures are sequences such as sentences which would be difficult to batch together as regular Tensors, since the resulting shape would be ragged. The parse_example documentation has some good examples. Images are fixed length in that, if you load a batch of them, they'll all have the same shape (e.g. they're all 32x32 pixels, so a batch of 10 can have shape 10x32x32).

How to understand the term `tensor` in TensorFlow?

I am new to TensorFlow. While I am reading the existing documentation, I found the term tensor really confusing. Because of it, I need to clarify the following questions:
What is the relationship between tensor and Variable, tensor
vs. tf.constant, 'tensor' vs. tf.placeholder?
Are they all types of tensors?

TensorFlow doesn't have first-class Tensor objects, meaning that there are no notion of Tensor in the underlying graph that's executed by the runtime. Instead the graph consists of op nodes connected to each other, representing operations. An operation allocates memory for its outputs, which are available on endpoints :0, :1, etc, and you can think of each of these endpoints as a Tensor. If you have tensor corresponding to nodename:0 you can fetch its value as sess.run(tensor) or sess.run('nodename:0'). Execution granularity happens at operation level, so the run method will execute op which will compute all of the endpoints, not just the :0 endpoint. It's possible to have an Op node with no outputs (like tf.group) in which case there are no tensors associated with it. It is not possible to have tensors without an underlying Op node.
You can examine what happens in underlying graph by doing something like this
tf.reset_default_graph()
value = tf.constant(1)
print(tf.get_default_graph().as_graph_def())
So with tf.constant you get a single operation node, and you can fetch it using sess.run("Const:0") or sess.run(value)
Similarly, value=tf.placeholder(tf.int32) creates a regular node with name Placeholder, and you could feed it as feed_dict={"Placeholder:0":2} or feed_dict={value:2}. You can not feed and fetch a placeholder in the same session.run call, but you can see the result by attaching a tf.identity node on top and fetching that.
For variable
tf.reset_default_graph()
value = tf.Variable(tf.ones_initializer()(()))
value2 = value+3
print(tf.get_default_graph().as_graph_def())
You'll see that it creates two nodes Variable and Variable/read, the :0 endpoint is a valid value to fetch on both of these nodes. However Variable:0 has a special ref type meaning it can be used as an input to mutating operations. The result of Python call tf.Variable is a Python Variable object and there's some Python magic to substitute Variable/read:0 or Variable:0 depending on whether mutation is necessary. Since most ops have only 1 endpoint, :0 is dropped. Another example is Queue -- close() method will create a new Close op node which connects to Queue op. To summarize -- operations on python objects like Variable and Queue map to different underlying TensorFlow op nodes depending on usage.
For ops like tf.split or tf.nn.top_k which create nodes with multiple endpoints, Python's session.run call automatically wraps output in tuple or collections.namedtuple of Tensor objects which can be fetched individually.

From the glossary:
A Tensor is a typed multi-dimensional array. For example, a 4-D array of floating point numbers representing a mini-batch of images with dimensions [batch, height, width, channel].
Basically, every data is a Tensor in TensorFlow (hence the name):
placeholders are Tensors to which you can feed a value (with the feed_dict argument in sess.run())
Variables are Tensors which you can update (with var.assign()). Technically speaking, tf.Variable is not a subclass of tf.Tensor though
tf.constant is just the most basic Tensor, which contains a fixed value given when you create it
However, in the graph, every node is an operation, which can have Tensors as inputs or outputs.

As already mentioned by others, yes they are all tensors.
The way I understood those is to first visualize and understand 1D, 2D, 3D, 4D, 5D, and 6D tensors as in the picture below. (source: knoldus)
Now, in the context of TensorFlow, you can imagine a computation graph like the one below,
Here, the Ops take two tensors a and b as input; multiplies the tensors with itself and then adds the result of these multiplications to produce the result tensor t3. And these multiplications and addition Ops happen at the nodes in the computation graph.
And these tensors a and b can be constant tensors, Variable tensors, or placeholders. It doesn't matter, as long as they are of the same data type and compatible shapes(or broadcastable to it) to achieve the operations.

Data is stored in matrices. A 28x28 pixel grayscale image fits into a
28x28 two-dimensional matrix. But for a color image, we need more
dimensions. There are 3 color values per pixel (Red, Green, Blue), so
a three-dimensional table will be needed with dimensions [28, 28, 3].
And to store a batch of 128 color images, a four-dimensional table is
needed with dimensions [128, 28, 28, 3].
These multi-dimensional tables are called "tensors" and the list of
their dimensions is their "shape".
Source

TensorFlow's central data type is the tensor. Tensors are the underlying components of computation and a fundamental data structure in TensorFlow. Without using complex mathematical interpretations, we can say a tensor (in TensorFlow) describes a multidimensional numerical array, with zero or n-dimensional collection of data, determined by rank, shape, and type.Read More: What is tensors in TensorFlow?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.