tf.nn.conv2d vs tf.layers.conv2d - python

Is there any advantage in using tf.nn.* over tf.layers.*?
Most of the examples in the doc use tf.nn.conv2d, for instance, but it is not clear why they do so.

As GBY mentioned, they use the same implementation.
There is a slight difference in the parameters.
For tf.nn.conv2d:
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
For tf.layers.conv2d:
filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
I would use tf.nn.conv2d when loading a pretrained model (example code: https://github.com/ry/tensorflow-vgg16), and tf.layers.conv2d for a model trained from scratch.

For convolution, they are the same. More precisely, tf.layers.conv2d (actually _Conv) uses tf.nn.convolution as the backend. You can follow the calling chain of: tf.layers.conv2d>Conv2D>Conv2D.apply()>_Conv>_Conv.apply()>_Layer.apply()>_Layer.\__call__()>_Conv.call()>nn.convolution()...

As others mentioned the parameters are different especially the "filter(s)". tf.nn.conv2d takes a tensor as a filter, which means you can specify the weight decay (or maybe other properties) like the following in cifar10 code. (Whether you want/need to have weight decay in conv layer is another question.)
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
I'm not quite sure how to set weight decay in tf.layers.conv2d since it only take an integer as filters. Maybe using kernel_constraint?
On the other hand, tf.layers.conv2d handles activation and bias automatically while you have to write additional codes for these if you use tf.nn.conv2d.

All of these other replies talk about how the parameters are different, but actually, the main difference of tf.nn and tf.layers conv2d is that for tf.nn, you need to create your own filter tensor and pass it in. This filter needs to have the size of: [kernel_height, kernel_width, in_channels, num_filters]
Essentially, tf.nn is lower level than tf.layers. Unfortunately, this answer is not applicable anymore is tf.layers is obselete

DIFFERENCES IN PARAMETER:
Using tf.layer* in a code:
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
Using tf.nn* in a code:
( Notice we need to pass weights and biases additionally as parameters )
strides = 1
# Weights matrix looks like: [kernel_size(=5), kernel_size(=5), input_channels (=3), filters (= 32)]
# Similarly bias = looks like [filters (=32)]
out = tf.nn.conv2d(input, weights, padding="SAME", strides = [1, strides, strides, 1])
out = tf.nn.bias_add(out, bias)
out = tf.nn.relu(out)

Take a look here:tensorflow > tf.layers.conv2d
and here: tensorflow > conv2d
As you can see the arguments to the layers version are:
tf.layers.conv2d(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, trainable=True, name=None, reuse=None)
and the nn version:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I think you can choose the one with the options you want/need/like!

Related

Quantization aware training in TensorFlow version 2 and BatchNorm folding

I'm wondering what the current available options are for simulating BatchNorm folding during quantization aware training in Tensorflow 2. Tensorflow 1 has the tf.contrib.quantize.create_training_graph function which inserts FakeQuantization layers into the graph and takes care of simulating batch normalization folding (according to this white paper).
Tensorflow 2 has a tutorial on how to use quantization in their recently adopted tf.keras API, but they don't mention anything about batch normalization. I tried the following simple example with a BatchNorm layer:
import tensorflow_model_optimization as tfmo
model = tf.keras.Sequential([
l.Conv2D(32, 5, padding='same', activation='relu', input_shape=input_shape),
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.Conv2D(64, 5, padding='same', activation='relu'),
l.BatchNormalization(), # BN!
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.Flatten(),
l.Dense(1024, activation='relu'),
l.Dropout(0.4),
l.Dense(num_classes),
l.Softmax(),
])
model = tfmo.quantization.keras.quantize_model(model)
It however gives the following exception:
RuntimeError: Layer batch_normalization:<class 'tensorflow.python.keras.layers.normalization.BatchNormalization'> is not supported. You can quantize this layer by passing a `tfmot.quantization.keras.QuantizeConfig` instance to the `quantize_annotate_layer` API.
which indicates that TF does not know what to do with it.
I also saw this related topic where they apply tf.contrib.quantize.create_training_graph on a keras constructed model. They however don't use BatchNorm layers, so I'm not sure this will work.
So what are the options for using this BatchNorm folding feature in TF2? Can this be done from the keras API, or should I switch back to the TensorFlow 1 API and define a graph the old way?
If you add BatchNormalization before activation, you would not have issues with Quantization. Note: Quantization is supported in BatchNormalization only if it the layer is exactly after Conv2D layer.
https://www.tensorflow.org/model_optimization/guide/quantization/training
# Change
l.Conv2D(64, 5, padding='same', activation='relu'),
l.BatchNormalization(), # BN!
# with this
l.Conv2D(64, 5, padding='same'),
l.BatchNormalization(),
l.Activation('relu'),
#Other way of declaring the same
o = (Conv2D(512, (3, 3), padding='valid' , data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = Activation('relu')(o)
You should apply the quantization annotation as in the instruction. I think you can call the BatchNorm now like this:
class DefaultBNQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
def get_weights_and_quantizers(self, layer):
return []
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
pass
def set_quantize_activations(self, layer, quantize_activations):
pass
def get_output_quantizers(self, layer):
return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
num_bits=8, per_axis=False, symmetric=False, narrow_range=False)]
def get_config(self):
return {}
If you still want to quantize for the layer, change the return of the get_weights_and_quantizers to return [(layer.weights[i], LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)) for i in range(2)]. Then set back the quantizers to gamma,beta,... according to the indices of the return list above at set_quantize_weights. However, I am not encouraging this way as it surely harm the accuracy as BN should act as an activation quantization
The result you have would be like this (RESNET50):

How to use Padding in conv2d layer of specific size

My input size image is :
256 * 256
Conv2d Kernal Size : 4*4 and strides at 2*2.
The output will be 127*127.
I want to pass to Max Pool for this i want to apply padding to make it 128*128 so that pooling works well and pooling output will be used in other layers.
How i can apply padding for this conv.
conv1 = tf.layers.conv2d(x, 32, (4,4),strides=(2,2), activation=tf.nn.relu)
tf.layers.conv2d has a padding parameter that you can use to do this. The default is "valid" which means no padding is done, so each convolution will slightly shrink the input. You can pass padding="same" instead. This will apply padding such that the output of the convolution is equal in size to the input. This is before strides, so using a stride of 2 will still downsample by a factor 2. In your example, using padding="same" should result in the convolution output to have size 128x128.

Custom function in Lambda layer fails, cannot convert tensor to numpy

So I am trying to implement a custom function using Lambda layers in Keras (Tensorflow backend).
I want to convert the input Tensor into numpy array to perform my function. However, I cannot run tensor.eval() as it throws an error :
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,960,960,1]
This is my code:
def tensor2np(tensor):
return tensor.eval(session=K.get_session())
def np2tensor(np):
return tf.convert_to_tensor(np.reshape((1,480,480,3)))
def calculate_dwt1(tensor):
np_input = tensor2np(tensor)
coeff = pywt.wavedec2((np_input[0,:,:,0]), 'db1', level=1)
return np2tensor(np.dstack((coeff[1][0],coeff[1][1],coeff[1][2])))
def network():
input = Input(shape=(960,960,1), dtype='float32')
conv1 = Convolution2D(64, (3,3), activation='relu', padding='same')(input)
conv1 = Convolution2D(64, (3,3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D((2,2), strides=(2,2))(conv1)
conv2 = Convolution2D(128, (3, 3), activation='relu', padding='same')(pool1)
conv2 = Convolution2D(128, (3, 3), activation='relu', padding='same')(conv2)
lambda1 = Lambda(calculate_dwt1)(input)
me = merge((lambda1, conv2),mode='concat', concat_axis=3)
..
..
Or is there anyway I can get the result of the custom function at runtime and convert to Tensor and feed it into my network?
Basically, I'm trying to implement this model architecture.
As it is, you're asking your network to backpropagate through a) the array-> tensor transformation and b) a blackbox function that operates on arrays. Obviously it's no surprise it's unable to do that. You will need to rewrite your custom function using standard (or custom) TF/K operations, and have it be applied on tensor objects. Then and only then will it be able to propagate gradients backwards and values forward.
If you want to use a pure python function as a TensorFlow operation, you can use tf.py_func.
In your case, you need to use a custom python function as a loss function instead of built-in operations. TensorFlow's built-in operations are symbolic and compiled before execution. Then TensorFlow optimizes the given cost function by using its gradients. As your custom loss function's gradient is unknown, TensorFlow cannot optimize your custom loss function.
You have two options. You can either define your custom function in a more symbolic way in order to utilize TF's automatic differentiation, or you need to provide your pure python function's gradient externally like this.

Can not use both bias and batch normalization in convolution layers

I use slim framework for tensorflow, because of its simplicity.
But I want to have convolutional layer with both biases and batch normalization.
In vanilla tensorflow, I have:
def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):
with tf.variable_scope(name):
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
tf.summary.histogram("weights", w)
tf.summary.histogram("biases", biases)
return conv
d_bn1 = BatchNorm(name='d_bn1')
h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv')))
and I rewrote it to slim by this:
h1 = slim.conv2d(h0,
num_outputs=self.df_dim + self.y_dim,
scope='d_h1_conv',
kernel_size=[5, 5],
stride=[2, 2],
activation_fn=lrelu,
normalizer_fn=layers.batch_norm,
normalizer_params=batch_norm_params,
weights_initializer=layers.xavier_initializer(uniform=False),
biases_initializer=tf.constant_initializer(0.0)
)
But this code does not add bias to conv layer.
That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is
layer = layer_class(filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=rate,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
in the construction of layer, which results in not having bias when using batch normalization.
Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?
Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:
gamma * normalized(x) + bias
So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.
If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.
But the solution would be something like
net = slim.conv2d(net, normalizer_fn=None, ...)
net = tf.nn.batch_normalization(net)
Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.
Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.
The reason there is no bias for our convolutional layers is because we have batch normalization applied to their outputs. The goal of batch normalization is to get outputs with:
mean = 0
standard deviation = 1
Since we want the mean to be 0, we do not want to add an offset (bias) that will deviate from 0. We want the outputs of our convolutional layer to rely only on the coefficient weights.

conv2d_transpose is dependent on batch_size when making predictions

I have a neural network currently implemented in tensorflow, but I am having a problem making predictions after training, because I have a conv2d_transpose operations, and the shapes of these ops are dependent on the batch size. I have a layer that requires output_shape as an argument:
def deconvLayer(input, filter_shape, output_shape, strides):
W1_1 = weight_variable(filter_shape)
output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")
return output
That is actually used in a larger model I have constructed like the following:
conv3 = layers.convLayer(conv2['layer_output'], [3, 3, 64, 128], use_pool=False)
conv4 = layers.deconvLayer(conv3['layer_output'],
filter_shape=[2, 2, 64, 128],
output_shape=[batch_size, 32, 40, 64],
strides=[1, 2, 2, 1])
The problem is, if I go to make a prediction using the trained model, my test data has to have the same batch size, or else I get the following error.
tensorflow.python.framework.errors.InvalidArgumentError: Conv2DBackpropInput: input and out_backprop must have the same batch size
Is there some way that I can get a prediction for an input with variable batch size? When I look at the trained weights, nothing seems to depend on batch size, so I can't see why this would be a problem.
So I came across a solution based on the issues forum of tensorflow at https://github.com/tensorflow/tensorflow/issues/833.
In my code
conv4 = layers.deconvLayer(conv3['layer_output'],
filter_shape=[2, 2, 64, 128],
output_shape=[batch_size, 32, 40, 64],
strides=[1, 2, 2, 1])
my output shape that get passed to deconvLayer was hard coded with a predetermined batch shape when training. By altering this to the following:
def deconvLayer(input, filter_shape, output_shape, strides):
W1_1 = weight_variable(filter_shape)
dyn_input_shape = tf.shape(input)
batch_size = dyn_input_shape[0]
output_shape = tf.pack([batch_size, output_shape[1], output_shape[2], output_shape[3]])
output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")
return output
This allows the shape to be dynamically inferred at run time and can handle a variable batch size.
Running the code, I no longer receive this error when passing in any batch size of test data. I believe this is necessary due to the fact that the inference of shapes for transpose ops is not as straightforward at the moment as it is for normal convolutional ops. So where we would usually use None for the batch_size in normal convolutional ops, we must provide a shape, and since this could vary based on input, we must go through the effort of dynamically determining it.

Categories

Resources