How and when does quantization work in a TFLite Graph?

How and when does quantization work in a TFLite Graph? - python

I have a TF model which was trained with quantization, frozen, converted to tflite with TOCO, and now I have the TFLite HTML Graph Model and json.
I can see that, for each of the tensors in my graph, each have quantization attributes (min, max, scale, zero-pt), and I'm trying to determine how each of these attributes applies to each tensor.
For instance, I understand the representation of quantized data, and I can understand that taking the quantized weights/biases, multiplying by scale and adding the minimum value returns the original weights/biases (almost).
What I don't understand:
Why do some tensors have quantization attributes (eg Relu, Sigmoid) but no intrinsic parameters (like weights and biases do)? Is it because they are output tensors and the quantization is applied before the data is input into the next operation?
At what points (if any) are the quantization applied during the dataflow through the model? For example, say there is an image tensor of floats passed a conv2d operation - where and how are the quantization attributes of weights/bias/relu used to get the output of the conv2d operation?
Essentially, If I parsed the TFLite models data to a numpy array, what are all the things I'd need to know about the flow of the data through the network (with respect to quantization) in order to recreate the model for inference from scratch.
I can't seem to find any documentation regarding this. Any help would be appreciated.

The convolution inner loop does macc of uint8 values. There is also a smaller outer loop for computing the z-offset portions of the macc. At the end of each kernel convolution you will need to downscale from the int32 accumulator to the 8 bit uint8 range using a downscale multiplier that is input_scale * kernel_scale / output_scale. Those three scale values were learned during training, and are in the tflite inference file. This paper explains the operations.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

Related

Possible to use differently shaped outputs of a multi-headed Keras model in a single loss function?

I'm interested in using multiple outputs from a Keras model in a single objective function.
I can't simply use a concatenate layer, as the outputs are of different size. As an example: consider a model that will do the standard digit recognition task on the MNIST dataset. I need the model to output one tensor of size (784,) (the input shape after the images have been flattened) and one of size (10,) (the class probabilities), and use them both in a single custom loss function.

How to create a custom layer for Sampling in Keras Tensorflow?

I'm building a CNN in Keras with a Tensorflow backend, and I'd like to introduce a custom layer that should perform the following:
Output a tensor of same shape and dtype as the input tensor.
The output is made of few samples of the input tensor, let's say 25%. The rest of the output tensor should be zeros.
The samples must be picked at random, such that the highest values pixel are sampled with higher probability. In other words, the probability distribution should be the input tensor itself (normalized).
For now, I've managed to build a mockup where I pick the top 25% pixels of the input tensor and create an output tensor of same size only from them. But it is not a random sampling.
Ideally I'd like to use a tensorflow equivalent of : np.random.choice(input_tensor, num_samples, input_tensor_normalized) where the third argument is the probability distribution to follow. Note that this only works on 1D np.array.
I've heard of tf.random.multinomial but it's depreciated and tf.random.categorical takes logits as inputs (I don't think it's my case) and doesn't propose a probability distribution.
A possibility is to reshape the input tensor as a vector, perform 1D sampling in Tensorflow if there is a way, construct a similar vector with the sampled values at the corresponding index and zeros elsewhere, and then reshape as a tensor afterwards.
Any other idea?
Should I move to PyTorch?
Thank you very Much

You can still use the tf.random.categorical. The logits are just the unnormalised log probabilities. So if you already have your probability distribution ready to go you can perform:
samples = tf.random.categorical(tf.log(input_tensor_normalized), num_samples)

Creating and train only specified weights in TensorFlow or PyTorch

I am wondering if there is a way in TensorFlow, PyTorch or some other library to selectively connect neurons. I want to make a network with a very large number of neurons in each layer, but that has very few connections between layers.
Note that I do not think this is a duplicate of this answer: Selectively zero weights in TensorFlow?. I implemented a custom keras layer using essentially the same method that appears in that question - essentially by creating a dense layer where all but the specified weights are ignored in training and evaluation. This fulfills part of what I want to do by not training specified weights, and not using them for prediction. But, the problems is that I still waste memory saving the untrained weights, and I waste time calculating the gradients of the zeroed weights. What I would like is for the computation of the gradient matrices to involve only sparse matrices, so that I do not waste time and memory.
Is there a way to selectively create and train weights without wasting memory? If my question is unclear or there is more information that it would be helpful for me to provide, please let me know. I would like to be helpful as a question-asker.

The usual, simple solution is to initialize your weight matrices to have zeros where there should be no connection. You store a mask of the location of these zeros, and set the weights at these positions to zero after each weight update. You need to do this as the gradient for zero weights may be nonzero, and this would introduce nonzero weights (i.e. connectios) where you don't want any.
Pseudocode:
# setup network
weights = sparse_init() # only nonzero for existing connections
zero_mask = where(weights == 0)
# train
for e in range(num_epochs):
train_operation() # may lead to introduction of new connections
weights[zero_mask] = 0 # so we set them to zero again

Both tensorflow and pytorch support sparse tensors (torch.sparse, tf.sparse).
My intuitive understanding would be that if you were willing to write your network using the respective low level APIs (e.g. actually implementing the forward-pass yourself), you could cast your weight matrices as sparse tensors. That would in turn result in sparse connectivity, since the weight matrix of layer [L] defines the connectivity between neurons of the previous layer [L-1] with neurons of layer [L].

How to understand the term `tensor` in TensorFlow?

I am new to TensorFlow. While I am reading the existing documentation, I found the term tensor really confusing. Because of it, I need to clarify the following questions:
What is the relationship between tensor and Variable, tensor
vs. tf.constant, 'tensor' vs. tf.placeholder?
Are they all types of tensors?

TensorFlow doesn't have first-class Tensor objects, meaning that there are no notion of Tensor in the underlying graph that's executed by the runtime. Instead the graph consists of op nodes connected to each other, representing operations. An operation allocates memory for its outputs, which are available on endpoints :0, :1, etc, and you can think of each of these endpoints as a Tensor. If you have tensor corresponding to nodename:0 you can fetch its value as sess.run(tensor) or sess.run('nodename:0'). Execution granularity happens at operation level, so the run method will execute op which will compute all of the endpoints, not just the :0 endpoint. It's possible to have an Op node with no outputs (like tf.group) in which case there are no tensors associated with it. It is not possible to have tensors without an underlying Op node.
You can examine what happens in underlying graph by doing something like this
tf.reset_default_graph()
value = tf.constant(1)
print(tf.get_default_graph().as_graph_def())
So with tf.constant you get a single operation node, and you can fetch it using sess.run("Const:0") or sess.run(value)
Similarly, value=tf.placeholder(tf.int32) creates a regular node with name Placeholder, and you could feed it as feed_dict={"Placeholder:0":2} or feed_dict={value:2}. You can not feed and fetch a placeholder in the same session.run call, but you can see the result by attaching a tf.identity node on top and fetching that.
For variable
tf.reset_default_graph()
value = tf.Variable(tf.ones_initializer()(()))
value2 = value+3
print(tf.get_default_graph().as_graph_def())
You'll see that it creates two nodes Variable and Variable/read, the :0 endpoint is a valid value to fetch on both of these nodes. However Variable:0 has a special ref type meaning it can be used as an input to mutating operations. The result of Python call tf.Variable is a Python Variable object and there's some Python magic to substitute Variable/read:0 or Variable:0 depending on whether mutation is necessary. Since most ops have only 1 endpoint, :0 is dropped. Another example is Queue -- close() method will create a new Close op node which connects to Queue op. To summarize -- operations on python objects like Variable and Queue map to different underlying TensorFlow op nodes depending on usage.
For ops like tf.split or tf.nn.top_k which create nodes with multiple endpoints, Python's session.run call automatically wraps output in tuple or collections.namedtuple of Tensor objects which can be fetched individually.

From the glossary:
A Tensor is a typed multi-dimensional array. For example, a 4-D array of floating point numbers representing a mini-batch of images with dimensions [batch, height, width, channel].
Basically, every data is a Tensor in TensorFlow (hence the name):
placeholders are Tensors to which you can feed a value (with the feed_dict argument in sess.run())
Variables are Tensors which you can update (with var.assign()). Technically speaking, tf.Variable is not a subclass of tf.Tensor though
tf.constant is just the most basic Tensor, which contains a fixed value given when you create it
However, in the graph, every node is an operation, which can have Tensors as inputs or outputs.

As already mentioned by others, yes they are all tensors.
The way I understood those is to first visualize and understand 1D, 2D, 3D, 4D, 5D, and 6D tensors as in the picture below. (source: knoldus)
Now, in the context of TensorFlow, you can imagine a computation graph like the one below,
Here, the Ops take two tensors a and b as input; multiplies the tensors with itself and then adds the result of these multiplications to produce the result tensor t3. And these multiplications and addition Ops happen at the nodes in the computation graph.
And these tensors a and b can be constant tensors, Variable tensors, or placeholders. It doesn't matter, as long as they are of the same data type and compatible shapes(or broadcastable to it) to achieve the operations.

Data is stored in matrices. A 28x28 pixel grayscale image fits into a
28x28 two-dimensional matrix. But for a color image, we need more
dimensions. There are 3 color values per pixel (Red, Green, Blue), so
a three-dimensional table will be needed with dimensions [28, 28, 3].
And to store a batch of 128 color images, a four-dimensional table is
needed with dimensions [128, 28, 28, 3].
These multi-dimensional tables are called "tensors" and the list of
their dimensions is their "shape".
Source

TensorFlow's central data type is the tensor. Tensors are the underlying components of computation and a fundamental data structure in TensorFlow. Without using complex mathematical interpretations, we can say a tensor (in TensorFlow) describes a multidimensional numerical array, with zero or n-dimensional collection of data, determined by rank, shape, and type.Read More: What is tensors in TensorFlow?

Image Segmentation with TensorFlow

I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?

Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.

SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.

Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How and when does quantization work in a TFLite Graph? - python

Related

Possible to use differently shaped outputs of a multi-headed Keras model in a single loss function?

How to create a custom layer for Sampling in Keras Tensorflow?

Creating and train only specified weights in TensorFlow or PyTorch

How to understand the term `tensor` in TensorFlow?

Image Segmentation with TensorFlow

Categories

Resources