Can not use both bias and batch normalization in convolution layers

Can not use both bias and batch normalization in convolution layers - python

I use slim framework for tensorflow, because of its simplicity.
But I want to have convolutional layer with both biases and batch normalization.
In vanilla tensorflow, I have:
def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):
with tf.variable_scope(name):
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
tf.summary.histogram("weights", w)
tf.summary.histogram("biases", biases)
return conv
d_bn1 = BatchNorm(name='d_bn1')
h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv')))
and I rewrote it to slim by this:
h1 = slim.conv2d(h0,
num_outputs=self.df_dim + self.y_dim,
scope='d_h1_conv',
kernel_size=[5, 5],
stride=[2, 2],
activation_fn=lrelu,
normalizer_fn=layers.batch_norm,
normalizer_params=batch_norm_params,
weights_initializer=layers.xavier_initializer(uniform=False),
biases_initializer=tf.constant_initializer(0.0)
)
But this code does not add bias to conv layer.
That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is
layer = layer_class(filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=rate,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
in the construction of layer, which results in not having bias when using batch normalization.
Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?

Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:
gamma * normalized(x) + bias
So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.
If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.
But the solution would be something like
net = slim.conv2d(net, normalizer_fn=None, ...)
net = tf.nn.batch_normalization(net)
Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.
Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.

The reason there is no bias for our convolutional layers is because we have batch normalization applied to their outputs. The goal of batch normalization is to get outputs with:
mean = 0
standard deviation = 1
Since we want the mean to be 0, we do not want to add an offset (bias) that will deviate from 0. We want the outputs of our convolutional layer to rely only on the coefficient weights.

Related

Tensorflow: Sigmoid cross entropy loss does not force network outputs to be 0 or 1

I would like to learn image segmentation in TensorFlow with values in {0.0,1.0}. I have two images, ground_truth and prediction and each have shape (120,160). The ground_truth image pixels only contain values that are either 0.0 or 1.0.
The prediction image is the output of a decoder and the last two layers of it are a tf.layers.conv2d_transpose and tf.layers.conv2d like so:
transforms (?,120,160,30) -> (?,120,160,15)
outputs = tf.layers.conv2d_transpose(outputs, filters=15, kernel_size=1, strides=1, padding='same')
# ReLU
outputs = activation(outputs)
# transforms (?,120,160,15) -> (?,120,160,1)
outputs = tf.layers.conv2d(outputs, filters=1, kernel_size=1, strides=1, padding='same')
The last layer does not carry an activation function and thus it's output is unbounded. I use the following loss function:
logits = tf.reshape(predicted, [-1, predicted.get_shape()[1] * predicted.get_shape()[2]])
labels = tf.reshape(ground_truth, [-1, ground_truth.get_shape()[1] * ground_truth.get_shape()[2]])
loss = 0.5 * tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits))
This setup converges nicely. However, I have realized that the outputs of my last NN layer at validation time seem to be in [-inf, inf]. If I visualize the output I can see that the segmented object is not segmented since almost all pixels are "activated". The distributions of values for a single output of the last conv2d layer looks like this:
Question:
Do I have to post-process the outputs (crop negative values or run output trough a sigmoid activation etc.)? What do I need to do to enforce my output values to be {0,1}?

Solved it. The problem was that the tf.nn.sigmoid_cross_entropy_with_logits runs the logits through a sigmoid which is of course not used at validation time since the loss operation is only called during train time. The solution therefore is:
make sure to run the network outputs through a tf.nn.sigmoid at validation/test time like this:
return output if is_training else tf.nn.sigmoid(output)

How can I set Bias and change Sigmoid to ReLU function in ANN?

I'm trying to create a data prediction model through artificial neural networks. The following code is part of the Python-based ANN code created through many books. Also, the error rate between the predicted value and the actual value doesn't meet below 19%. I tried to increase the number of hidden layers, but it did not tremendously affect the error rate. I think this is probably a limitation of Sigmoid function and not considering Bias. I looked around for a month and found out how to build ReLU and Bias, but I could not find the range of Bias and ReLU.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
I'm sorry to ask an elementary question.
class neuralNetwork:
# initialize the neural network
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
#
self.inodes = input_nodes
self.hnodes = hidden_nodes
self.onodes = output_nodes
# link weight matrices, wih and who
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
# learning rate
self.lr = learning_rate
# activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)
pass
# train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
# output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
# update the weights for the links between the hidden and output layers
self.who += self.lr*numpy.dot((output_errors*final_outputs*(1.0-final_outputs)), numpy.transpose(hidden_outputs))
# update the weights for the links between the input and output layers
self.wih += self.lr*numpy.dot((hidden_errors*hidden_outputs*(1.0-hidden_outputs)), numpy.transpose(inputs))
pass
# query the neural network
def query(self, inputs_list) :
inputs = numpy.array(inputs_list, ndmin=2).T
# convert hidden list to 2d array
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
final_inputs = numpy.dot(self.who, hidden_outputs)
final_outputs = self.activation_function(final_inputs)
return final_outputs
pass

Your question is too broad and there are lots of concept behind ReLU vs sigmoid.
But in short:
Sigmoid Saturate and kill gradients (look at Gradient descent) sigmoid are not zero centered because output of sigmoid is 0<output<1. I can see for sigmoid you are using
scipy but for ReLU its easy. Relu is defined by the following function
f(x) = max(0,x)
This means if the input is greater then zero return input else return 0. And ReLU is prefered for hidden layers and other like softmax for output layers.
I would say, look different activation function and why we need activation functions on neural net. How sigmoid kills gradients and why they slow converge.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
simply write a method on your own based on the ReLU function above and update the following line
self.activation_function = max(0,x) # instead of lambda x: scipy.special.expit(x)
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
Answer of this question depends on your network and your data, but yes you normalize the data. And there is no such range that you need to make your data in. Because for ReLU: If the input is less than zero, it will return 0 and if input is >= 0, it will return the input. So no such range like in sigmoid. Answer of this question
If you wanna look at how ReLU works and can be used, following detailed example will help though these examples are written using framework (PyTorch) to build the network and train.
PyTorch Basic Projects Link
ReLU vs sigmoid vs TanH Video

DCGANs: discriminator getting too strong too quickly to allow generator to learn [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to use this version of the DCGAN code (implemented in Tensorflow) with some of my data. I run into the problem of the discriminator becoming too strong way too quickly for generator to learn anything.
Now there are some tricks typically recommended for that problem with GANs:
batch normalisation (already there in DCGANs code)
giving a head start to generator.
I did some version of the latter by allowing 10 iterations of generator per 1 of discriminator (not just in the beginning, but throughout the entire training), and that's how it looks:
Adding more generator iterations in this case helps only by slowing down the inevitable - discriminator growing too strong and suppressing the generator learning.
Hence I would like to ask for an advice on whether there is another way that could help the problem of a too strong discriminator?

I think there are several ways to decrease discriminator:
Try leaky_relu and dropout in discriminator function:
def leaky_relu(x, alpha, name="leaky_relu"):
return tf.maximum(x, alpha * x , name=name)
Here is entire definition:
def discriminator(images, reuse=False):
# Implement a seperate leaky_relu function
def leaky_relu(x, alpha, name="leaky_relu"):
return tf.maximum(x, alpha * x , name=name)
# Leaky parameter Alpha
alpha = 0.2
# Add batch normalization, kernel initializer, the LeakyRelu activation function, ect. to the layers accordingly
with tf.variable_scope('discriminator', reuse=reuse):
# 1st conv with Xavier weight initialization to break symmetry, and in turn, help converge faster and prevent local minima.
images = tf.layers.conv2d(images, 64, 5, strides=2, padding="same", kernel_initializer=tf.contrib.layers.xavier_initializer())
# batch normalization
bn = tf.layers.batch_normalization(images, training=True)
# Leaky relu activation function
relu = leaky_relu(bn, alpha, name="leaky_relu")
# Dropout "rate=0.1" would drop out 10% of input units, oppsite with keep_prob
drop = tf.layers.dropout(relu, rate=0.2)
# 2nd conv with Xavier weight initialization, 128 filters.
images = tf.layers.conv2d(drop, 128, 5, strides=2, padding="same", kernel_initializer=tf.contrib.layers.xavier_initializer())
bn = tf.layers.batch_normalization(images, training=True)
relu = leaky_relu(bn, alpha, name="leaky_relu")
drop = tf.layers.dropout(relu, rate=0.2)
# 3rd conv with Xavier weight initialization, 256 filters, strides=1 without reshape
images = tf.layers.conv2d(drop, 256, 5, strides=1, padding="same", kernel_initializer=tf.contrib.layers.xavier_initializer())
#print(images)
bn = tf.layers.batch_normalization(images, training=True)
relu = leaky_relu(bn, alpha, name="leaky_relu")
drop = tf.layers.dropout(relu, rate=0.2)
flatten = tf.reshape(drop, (-1, 7 * 7 * 128))
logits = tf.layers.dense(flatten, 1)
ouput = tf.sigmoid(logits)
return ouput, logits
Add label smoothing in discriminator loss to prevent discriminator becoming to strong. Increase smooth value according to d_loss performance.
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_real, labels=tf.ones_like(d_model_real)*(1.0 - smooth)))

To summarise this topic - the generic advice would be:
try playing with model parameters (like learning rates, for instance)
try adding more variety to the input data
try adjusting the architecture of both generator and discriminator
networks.
However, in my case the issue was the data scaling: I've changed the format of the input data from the initial .jpg to .npy and lost the rescaling on the way. Please note that this DCGAN-tensorflow code rescales the input data to [-1,1] range, and the model is tuned to work with this range.

tf.nn.conv2d vs tf.layers.conv2d

Is there any advantage in using tf.nn.* over tf.layers.*?
Most of the examples in the doc use tf.nn.conv2d, for instance, but it is not clear why they do so.

As GBY mentioned, they use the same implementation.
There is a slight difference in the parameters.
For tf.nn.conv2d:
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
For tf.layers.conv2d:
filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
I would use tf.nn.conv2d when loading a pretrained model (example code: https://github.com/ry/tensorflow-vgg16), and tf.layers.conv2d for a model trained from scratch.

For convolution, they are the same. More precisely, tf.layers.conv2d (actually _Conv) uses tf.nn.convolution as the backend. You can follow the calling chain of: tf.layers.conv2d>Conv2D>Conv2D.apply()>_Conv>_Conv.apply()>_Layer.apply()>_Layer.\__call__()>_Conv.call()>nn.convolution()...

As others mentioned the parameters are different especially the "filter(s)". tf.nn.conv2d takes a tensor as a filter, which means you can specify the weight decay (or maybe other properties) like the following in cifar10 code. (Whether you want/need to have weight decay in conv layer is another question.)
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
I'm not quite sure how to set weight decay in tf.layers.conv2d since it only take an integer as filters. Maybe using kernel_constraint?
On the other hand, tf.layers.conv2d handles activation and bias automatically while you have to write additional codes for these if you use tf.nn.conv2d.

All of these other replies talk about how the parameters are different, but actually, the main difference of tf.nn and tf.layers conv2d is that for tf.nn, you need to create your own filter tensor and pass it in. This filter needs to have the size of: [kernel_height, kernel_width, in_channels, num_filters]
Essentially, tf.nn is lower level than tf.layers. Unfortunately, this answer is not applicable anymore is tf.layers is obselete

DIFFERENCES IN PARAMETER:
Using tf.layer* in a code:
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
Using tf.nn* in a code:
( Notice we need to pass weights and biases additionally as parameters )
strides = 1
# Weights matrix looks like: [kernel_size(=5), kernel_size(=5), input_channels (=3), filters (= 32)]
# Similarly bias = looks like [filters (=32)]
out = tf.nn.conv2d(input, weights, padding="SAME", strides = [1, strides, strides, 1])
out = tf.nn.bias_add(out, bias)
out = tf.nn.relu(out)

Take a look here:tensorflow > tf.layers.conv2d
and here: tensorflow > conv2d
As you can see the arguments to the layers version are:
tf.layers.conv2d(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, trainable=True, name=None, reuse=None)
and the nn version:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I think you can choose the one with the options you want/need/like!

TensorFlow MLP example outputs binary instead of decimal

I'm trying to train a multilayer perseptron to classify between true or false, based on the given input.
So far I'm using the example:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
But this gives me the output as a binary value and I rather have a decimal or percentage based output.
What I've tried:
I've tried to change the optimizer for the other available ones with no success.
optimizer =
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

The optimizer will not change the output that is actually given by the layers.
The provided example uses ReLu for the layers, which is good for classification but to model probability it wouldn't work. You would be better off with a sigmoid function instead.
The sigmoid function can be used to model probability, whereas ReLu can be used to model positive real number.
In order to make it work for the provided example, change the multilayer_perceptron function to:
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.sigmoid(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1']), name="sigmoid_l1") #Hidden layer with sigmoid activation
layer_2 = tf.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']), name="sigmoid_l2") #Hidden layer with sigmoid activation
return tf.matmul(layer_2, _weights['out'], name="matmul_lout") + _biases['out']
It basically replaces the ReLu activation for a sigmoid one.
Then, for the evaluation, use softmax as follows:
output1 = tf.nn.softmax((multilayer_perceptron(x, weights, biases)), name="output")
avd = sess.run(output1, feed_dict={x: features_t})
It will provide you a range between 0 and 1 for each class. Also, you'll probably have to increase the number of epochs for this to work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can not use both bias and batch normalization in convolution layers - python

Related

Tensorflow: Sigmoid cross entropy loss does not force network outputs to be 0 or 1

How can I set Bias and change Sigmoid to ReLU function in ANN?

DCGANs: discriminator getting too strong too quickly to allow generator to learn [closed]

tf.nn.conv2d vs tf.layers.conv2d

TensorFlow MLP example outputs binary instead of decimal

Categories

Resources