I'm wondering what the current available options are for simulating BatchNorm folding during quantization aware training in Tensorflow 2. Tensorflow 1 has the tf.contrib.quantize.create_training_graph function which inserts FakeQuantization layers into the graph and takes care of simulating batch normalization folding (according to this white paper).
Tensorflow 2 has a tutorial on how to use quantization in their recently adopted tf.keras API, but they don't mention anything about batch normalization. I tried the following simple example with a BatchNorm layer:
import tensorflow_model_optimization as tfmo
model = tf.keras.Sequential([
l.Conv2D(32, 5, padding='same', activation='relu', input_shape=input_shape),
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.Conv2D(64, 5, padding='same', activation='relu'),
l.BatchNormalization(), # BN!
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.Flatten(),
l.Dense(1024, activation='relu'),
l.Dropout(0.4),
l.Dense(num_classes),
l.Softmax(),
])
model = tfmo.quantization.keras.quantize_model(model)
It however gives the following exception:
RuntimeError: Layer batch_normalization:<class 'tensorflow.python.keras.layers.normalization.BatchNormalization'> is not supported. You can quantize this layer by passing a `tfmot.quantization.keras.QuantizeConfig` instance to the `quantize_annotate_layer` API.
which indicates that TF does not know what to do with it.
I also saw this related topic where they apply tf.contrib.quantize.create_training_graph on a keras constructed model. They however don't use BatchNorm layers, so I'm not sure this will work.
So what are the options for using this BatchNorm folding feature in TF2? Can this be done from the keras API, or should I switch back to the TensorFlow 1 API and define a graph the old way?
If you add BatchNormalization before activation, you would not have issues with Quantization. Note: Quantization is supported in BatchNormalization only if it the layer is exactly after Conv2D layer.
https://www.tensorflow.org/model_optimization/guide/quantization/training
# Change
l.Conv2D(64, 5, padding='same', activation='relu'),
l.BatchNormalization(), # BN!
# with this
l.Conv2D(64, 5, padding='same'),
l.BatchNormalization(),
l.Activation('relu'),
#Other way of declaring the same
o = (Conv2D(512, (3, 3), padding='valid' , data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = Activation('relu')(o)
You should apply the quantization annotation as in the instruction. I think you can call the BatchNorm now like this:
class DefaultBNQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
def get_weights_and_quantizers(self, layer):
return []
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
pass
def set_quantize_activations(self, layer, quantize_activations):
pass
def get_output_quantizers(self, layer):
return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
num_bits=8, per_axis=False, symmetric=False, narrow_range=False)]
def get_config(self):
return {}
If you still want to quantize for the layer, change the return of the get_weights_and_quantizers to return [(layer.weights[i], LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)) for i in range(2)]. Then set back the quantizers to gamma,beta,... according to the indices of the return list above at set_quantize_weights. However, I am not encouraging this way as it surely harm the accuracy as BN should act as an activation quantization
The result you have would be like this (RESNET50):
Related
I am trying to build a reinforcement learning agent to learn off a custom environment, built to openai's gym specifications.
I have np arrays of size (20, 7) which I want to pass to the network, and output one of 7 actions.
I am having trouble building the actual network, as I want to include LSTM layers. My code is as follows:
def build_model():
model = Sequential()
model.add(LSTM(60, return_sequences = True, input_shape=(20, 7), activation = 'relu'))
model.add(Dense(21, activation = "relu"))
model.add(Flatten())
model.add(Dense(7, activation="linear"))
model.compile(loss="mse", optimizer=Adam(lr=0.0002), metrics=['accuracy'])
return model
However, when I build the agent, there is suddenly an extra dimension added on which the network does not expect:
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=50000, window_length = 1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
ValueError: Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (1, 1, 20, 7)
Im not exactly sure why the agent is reshaping the data to add an extra dimension, (or two?) but if anyone had an idea on how to stop this from happening so I can train my network I would be very grateful. My solution runs when I code it myself however I want to make use of the keras rl-2 library.
Thanks in advance!
For anyone looking for an answer, I fixed this by adding layer:
model.add(Reshape((20, 7), input_shape=(1, 20, 7)))
as the first layer
As I understand it your agent is able to compute your environment but is unable to compute other keras rl-2 environments because they add another dimension to the feature vector (the input). I believe this is because the environments you are trying to run returns a feature vector that includes channels. Channels simply mean how many values do you need to describe one pixel. For instance, RGB needs 3 channels while your environment returns a simplification of only one channel.
Since you are only interested in one channel you could either squeeze away that channel since you do not need it with:
state = state.squeeze(axis=1)
Before passing it into the net. Or you could define your model to include channels by setting your input to be (1,20,7) which could be useful if you in the future want to apply convolutional layers which require a defined count of channels.
try this at first layer
window_length = 1
model.add(Flatten(input_shape=(window_lenght ,) + env.observation_space.shape))
Is it possible to access pre-activation tensors in a Keras Model? For example, given this model:
import tensorflow as tf
image_ = tf.keras.Input(shape=[224, 224, 3], batch_size=1)
vgg19 = tf.keras.applications.VGG19(include_top=False, weights='imagenet', input_tensor=image_, input_shape=image_.shape[1:], pooling=None)
the usual way to access layers is:
intermediate_layer_model = tf.keras.models.Model(inputs=image_, outputs=[vgg19.get_layer('block1_conv2').output])
intermediate_layer_model.summary()
This gives the ReLU outputs for a layer, while I would like the ReLU inputs. I tried doing this:
graph = tf.function(vgg19, [tf.TensorSpec.from_tensor(image_)]).get_concrete_function().graph
outputs = [graph.get_tensor_by_name(tname) for tname in [
'vgg19/block4_conv3/BiasAdd:0',
'vgg19/block4_conv4/BiasAdd:0',
'vgg19/block5_conv1/BiasAdd:0'
]]
intermediate_layer_model = tf.keras.models.Model(inputs=image_, outputs=outputs)
intermediate_layer_model.summary()
but I get the error
ValueError: Unknown graph. Aborting.
The only workaround I've found is to edit the model file to manually expose the intermediates, turning every layer like this:
x = layers.Conv2D(256, (3, 3), activation="relu", padding="same", name="block3_conv1")(x)
into 2 layers where the 1st one can be accessed before activations:
x = layers.Conv2D(256, (3, 3), activation=None, padding="same", name="block3_conv1")(x)
x = layers.ReLU(name="block3_conv1_relu")(x)
Is there a way to acces pre-activation tensors in a Model without essentially editing Tensorflow 2 source code, or reverting to Tensorflow 1 which had full flexibility accessing intermediates?
There is a way to access pre-activation layers for pretrained Keras models using TF version 2.7.0. Here's how to access two intermediate pre-activation outputs from VGG19 in a single forward pass.
Initialize VGG19 model. We can omit top layers to avoid loading unnecessary parameters into memory.
vgg19 = tf.keras.applications.VGG19(
include_top=False,
weights="imagenet"
)
This is the important part: Create a deepcopy of the intermediate layer form which you like to have the features, change the activation of the conv layers to linear (i.e. no activation), rename the layer (otherwise two layers in the model will have the same name which will raise errors) and finally pass the output of the previous through the copied conv layer.
# for more intermediate features wrap a loop around it to avoid copy paste
b5c4_layer = deepcopy(vgg19.get_layer("block5_conv4"))
b5c4_layer.activation = tf.keras.activations.linear
b5c4_layer._name = b5c4_layer.name + str("_preact")
b5c4_preact_output = b5c4_layer(vgg19.get_layer("block5_conv3").output)
b2c2_layer = deepcopy(vgg19.get_layer("block2_conv2"))
b2c2_layer.activation = tf.keras.activations.linear
b2c2_layer._name = b2c2_layer.name + str("_preact")
b2c2_preact_output = b2c2_layer(vgg19.get_layer("block2_conv1").output)
Finally, get the outputs and check if they equal post-activation outputs when we apply ReLU-activation.
vgg19_features = Model(vgg19.input, [b2c2_preact_output, b5c4_preact_output])
vgg19_features_control = Model(vgg19.input, [vgg19.get_layer("block2_conv2").output, vgg19.get_layer("block5_conv4").output])
b2c2_preact, b5c4_preact = vgg19_features(tf.keras.applications.vgg19.preprocess_input(img))
b2c2, b5c4 = vgg19_features_control(tf.keras.applications.vgg19.preprocess_input(img))
print(np.allclose(tf.keras.activations.relu(b2c2_preact).numpy(),b2c2.numpy()))
print(np.allclose(tf.keras.activations.relu(b5c4_preact).numpy(),b5c4.numpy()))
True
True
Here's a visualization similar to Fig. 6 of Wang et al. to see the effect in the feature space.
Input image
To get output of each layer. You have to define a keras function and evaluate it for each layer.
Please refer the code as shown below
from tensorflow.keras import backend as K
inp = model.input # input
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
For more details on this please refer SO Answer.
Im working on converting and old project written in tensorflow v1.13 to pytorch v1.4.0 when I noticed that tensorflow and pytorch had different size weight tensors for the 2d cnns.
Here is my tensorflow code
cnn = tf.layers.conv2d(img_tensor, 16, (3, 3), (1, 1), padding='SAME', name='cnn_1')
cnn = tf.layers.conv2d(cnn, 32, (3, 3), (1, 1), padding='SAME', name='cnn_2')
init = tf.global_varaibles_initializer()
with tf.Session() as sess:
sess.run(init)
vars = {v.name:v for v in tf.trainable_variables()}
print(sess.run(vars['cnn_2/kernel:0']).shape)
Result
(3, 3, 1, 32)
Here is my pytorch code
class Net(Module):
def __init__(self):
super(Net, self).__init__()
self.create_cnn()
def create_cnn(self):
self.cnn_layers = Sequential(
Conv2d(1,16,3,padding=1)
Conv2d(16,32,3,padding=1)
)
def forward(self, x):
return self.cnn_layers(x)
def weights_init(m):
if type(m) == Conv2d:
if(m.bias.shape[0] == 32):
print(m.weight.data.shape)
model = Net()
model.apply(weights_init)
Result
torch.Size([32,16,3,3])
The reason this came up was because my pytorch model is not working so I started going a layer at a time and comparing outputs between tensorflow and pytorch. In order for that to work I had to set the weights on both models to the same values. Well I got the 2nd cnn layer and was confused when it failed to set the weights because the size was wrong. A little bit of poking around and I found this difference.
I looks like tensorflow is using the same kernel across all the channels where pytorch has a unique kernel for each channel. If this is the case, how can I replicate this in pytorch?
After re-reading the pytorch docs I noticed that the groups property is exactly related to this. That'll teach me not to skim over parts of the docs. By setting groups=in_channels I now get the size (32, 1, 3, 3) as desired.
Edit:
So even more embarrassing, in my test code I was feeding my inputs into both cnn layers instead of daisy chaining them. When I actually run the code as written above the second cnn in tensorflow does infact have weights with size (3, 3, 16, 32).
But at least I learned about grouping.
So I am trying to implement a custom function using Lambda layers in Keras (Tensorflow backend).
I want to convert the input Tensor into numpy array to perform my function. However, I cannot run tensor.eval() as it throws an error :
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,960,960,1]
This is my code:
def tensor2np(tensor):
return tensor.eval(session=K.get_session())
def np2tensor(np):
return tf.convert_to_tensor(np.reshape((1,480,480,3)))
def calculate_dwt1(tensor):
np_input = tensor2np(tensor)
coeff = pywt.wavedec2((np_input[0,:,:,0]), 'db1', level=1)
return np2tensor(np.dstack((coeff[1][0],coeff[1][1],coeff[1][2])))
def network():
input = Input(shape=(960,960,1), dtype='float32')
conv1 = Convolution2D(64, (3,3), activation='relu', padding='same')(input)
conv1 = Convolution2D(64, (3,3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D((2,2), strides=(2,2))(conv1)
conv2 = Convolution2D(128, (3, 3), activation='relu', padding='same')(pool1)
conv2 = Convolution2D(128, (3, 3), activation='relu', padding='same')(conv2)
lambda1 = Lambda(calculate_dwt1)(input)
me = merge((lambda1, conv2),mode='concat', concat_axis=3)
..
..
Or is there anyway I can get the result of the custom function at runtime and convert to Tensor and feed it into my network?
Basically, I'm trying to implement this model architecture.
As it is, you're asking your network to backpropagate through a) the array-> tensor transformation and b) a blackbox function that operates on arrays. Obviously it's no surprise it's unable to do that. You will need to rewrite your custom function using standard (or custom) TF/K operations, and have it be applied on tensor objects. Then and only then will it be able to propagate gradients backwards and values forward.
If you want to use a pure python function as a TensorFlow operation, you can use tf.py_func.
In your case, you need to use a custom python function as a loss function instead of built-in operations. TensorFlow's built-in operations are symbolic and compiled before execution. Then TensorFlow optimizes the given cost function by using its gradients. As your custom loss function's gradient is unknown, TensorFlow cannot optimize your custom loss function.
You have two options. You can either define your custom function in a more symbolic way in order to utilize TF's automatic differentiation, or you need to provide your pure python function's gradient externally like this.
Is there any advantage in using tf.nn.* over tf.layers.*?
Most of the examples in the doc use tf.nn.conv2d, for instance, but it is not clear why they do so.
As GBY mentioned, they use the same implementation.
There is a slight difference in the parameters.
For tf.nn.conv2d:
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
For tf.layers.conv2d:
filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
I would use tf.nn.conv2d when loading a pretrained model (example code: https://github.com/ry/tensorflow-vgg16), and tf.layers.conv2d for a model trained from scratch.
For convolution, they are the same. More precisely, tf.layers.conv2d (actually _Conv) uses tf.nn.convolution as the backend. You can follow the calling chain of: tf.layers.conv2d>Conv2D>Conv2D.apply()>_Conv>_Conv.apply()>_Layer.apply()>_Layer.\__call__()>_Conv.call()>nn.convolution()...
As others mentioned the parameters are different especially the "filter(s)". tf.nn.conv2d takes a tensor as a filter, which means you can specify the weight decay (or maybe other properties) like the following in cifar10 code. (Whether you want/need to have weight decay in conv layer is another question.)
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
I'm not quite sure how to set weight decay in tf.layers.conv2d since it only take an integer as filters. Maybe using kernel_constraint?
On the other hand, tf.layers.conv2d handles activation and bias automatically while you have to write additional codes for these if you use tf.nn.conv2d.
All of these other replies talk about how the parameters are different, but actually, the main difference of tf.nn and tf.layers conv2d is that for tf.nn, you need to create your own filter tensor and pass it in. This filter needs to have the size of: [kernel_height, kernel_width, in_channels, num_filters]
Essentially, tf.nn is lower level than tf.layers. Unfortunately, this answer is not applicable anymore is tf.layers is obselete
DIFFERENCES IN PARAMETER:
Using tf.layer* in a code:
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
Using tf.nn* in a code:
( Notice we need to pass weights and biases additionally as parameters )
strides = 1
# Weights matrix looks like: [kernel_size(=5), kernel_size(=5), input_channels (=3), filters (= 32)]
# Similarly bias = looks like [filters (=32)]
out = tf.nn.conv2d(input, weights, padding="SAME", strides = [1, strides, strides, 1])
out = tf.nn.bias_add(out, bias)
out = tf.nn.relu(out)
Take a look here:tensorflow > tf.layers.conv2d
and here: tensorflow > conv2d
As you can see the arguments to the layers version are:
tf.layers.conv2d(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, trainable=True, name=None, reuse=None)
and the nn version:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I think you can choose the one with the options you want/need/like!