Are these functions equivalent in TensorFlow? - python

I'm a newbie wit TensorFlow and I've studying it for the last few days.
I want to understand if the following two functions are equivalent or not:
1.
softmax = tf.add(tf.matmul(x, weights), biases, name=scope.name)
2.
softmax = tf.nn.softmax(tf.matmul(x, weights) + biases, name=scope.name)
If they are in fact different, what is the main difference?

softmax1 = tf.add(tf.matmul(x, weights), biases, name=scope.name)
is not equal to
softmax2 = tf.nn.softmax(tf.matmul(x, weights) + biases, name=scope.name)
since softmax1 has no softmax calculation at all while softmax2 does. See the Tensorflow API for tf.nn.softmax. The general idea of a softmax is that it normalizes the input by rescaling the whole data sequence ensuring their entries are in the interval (0, 1) and the sum is 1.
The only thing that is equal between the two statements is the basic calculation. + does the same thing tf.add does so tf.add(tf.matmul(x, weights), biases) is equal to tf.matmul(x, weights) + biases.
EDIT: To add some clarification (I think you do not know really know what softmax is doing?):
tf.matmul(x, W) + bias
Calculates a matrix multiplication between x (your input vector) and W the weights for the current layer. Afterwards the bias is added.
This calculation models the activation of one layer. Additionally you have an activation function, like the sigmoid function which transforms your activation. So for one layer you normally do something like this:
h1 = tf.sigmoid(tf.matmul(x, W) + bias)
Here h1 would be the activation of this layer.
The softmax operation simply rescales your input. E.g., if you got this activation on your output layer:
output = [[1.0, 2.0, 3.0, 5.0, 0.5, 0.2]]
The softmax rescales this input for fitting the values in the interval (0, 1) and being the sum equal to 1:
tf.nn.softmax(output)
> [[ 0.01497873, 0.0407164 , 0.11067866, 0.81781083, 0.00908506,
0.00673038]]
tf.reduce_sum(tf.nn.softmax(output))
> 1.0

Related

Why does this TensorFlow example not have a summation before the activation function?

I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).
Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.

How can I set Bias and change Sigmoid to ReLU function in ANN?

I'm trying to create a data prediction model through artificial neural networks. The following code is part of the Python-based ANN code created through many books. Also, the error rate between the predicted value and the actual value doesn't meet below 19%. I tried to increase the number of hidden layers, but it did not tremendously affect the error rate. I think this is probably a limitation of Sigmoid function and not considering Bias. I looked around for a month and found out how to build ReLU and Bias, but I could not find the range of Bias and ReLU.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
I'm sorry to ask an elementary question.
class neuralNetwork:
# initialize the neural network
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
#
self.inodes = input_nodes
self.hnodes = hidden_nodes
self.onodes = output_nodes
# link weight matrices, wih and who
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
# learning rate
self.lr = learning_rate
# activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)
pass
# train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
# output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
# update the weights for the links between the hidden and output layers
self.who += self.lr*numpy.dot((output_errors*final_outputs*(1.0-final_outputs)), numpy.transpose(hidden_outputs))
# update the weights for the links between the input and output layers
self.wih += self.lr*numpy.dot((hidden_errors*hidden_outputs*(1.0-hidden_outputs)), numpy.transpose(inputs))
pass
# query the neural network
def query(self, inputs_list) :
inputs = numpy.array(inputs_list, ndmin=2).T
# convert hidden list to 2d array
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate signals into hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
final_inputs = numpy.dot(self.who, hidden_outputs)
final_outputs = self.activation_function(final_inputs)
return final_outputs
pass
Your question is too broad and there are lots of concept behind ReLU vs sigmoid.
But in short:
Sigmoid Saturate and kill gradients (look at Gradient descent) sigmoid are not zero centered because output of sigmoid is 0<output<1. I can see for sigmoid you are using
scipy but for ReLU its easy. Relu is defined by the following function
f(x) = max(0,x)
This means if the input is greater then zero return input else return 0. And ReLU is prefered for hidden layers and other like softmax for output layers.
I would say, look different activation function and why we need activation functions on neural net. How sigmoid kills gradients and why they slow converge.
Q1 = How do I convert Sigmoid to ReLU and Q2 = how to add Bias to my code?
simply write a method on your own based on the ReLU function above and update the following line
self.activation_function = max(0,x) # instead of lambda x: scipy.special.expit(x)
Q3 = Also, If I change Sigmoid to ReLU, do I have to make my dataset 0.0~1.0 range? This is because Sigmoid function accepts 0.0~1.0 range of data, but I don't know what range ReLU allows.
Answer of this question depends on your network and your data, but yes you normalize the data. And there is no such range that you need to make your data in. Because for ReLU: If the input is less than zero, it will return 0 and if input is >= 0, it will return the input. So no such range like in sigmoid. Answer of this question
If you wanna look at how ReLU works and can be used, following detailed example will help though these examples are written using framework (PyTorch) to build the network and train.
PyTorch Basic Projects Link
ReLU vs sigmoid vs TanH Video

Can not use both bias and batch normalization in convolution layers

I use slim framework for tensorflow, because of its simplicity.
But I want to have convolutional layer with both biases and batch normalization.
In vanilla tensorflow, I have:
def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):
with tf.variable_scope(name):
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
tf.summary.histogram("weights", w)
tf.summary.histogram("biases", biases)
return conv
d_bn1 = BatchNorm(name='d_bn1')
h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv')))
and I rewrote it to slim by this:
h1 = slim.conv2d(h0,
num_outputs=self.df_dim + self.y_dim,
scope='d_h1_conv',
kernel_size=[5, 5],
stride=[2, 2],
activation_fn=lrelu,
normalizer_fn=layers.batch_norm,
normalizer_params=batch_norm_params,
weights_initializer=layers.xavier_initializer(uniform=False),
biases_initializer=tf.constant_initializer(0.0)
)
But this code does not add bias to conv layer.
That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is
layer = layer_class(filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=rate,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
in the construction of layer, which results in not having bias when using batch normalization.
Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?
Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:
gamma * normalized(x) + bias
So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.
If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.
But the solution would be something like
net = slim.conv2d(net, normalizer_fn=None, ...)
net = tf.nn.batch_normalization(net)
Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.
Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.
The reason there is no bias for our convolutional layers is because we have batch normalization applied to their outputs. The goal of batch normalization is to get outputs with:
mean = 0
standard deviation = 1
Since we want the mean to be 0, we do not want to add an offset (bias) that will deviate from 0. We want the outputs of our convolutional layer to rely only on the coefficient weights.

Initializing a bias term in my nonlinear regression model using TensorFlow

I am trying to make a basic nonlinear regression model that will predict the return index of companies in the FTSE350.
I am unsure as to what my bias term should look like in terms of dimensions and whether I am using it properly in the calculations method:
w1 = tf.Variable(tf.truncated_normal([4, 10], mean=0.0, stddev=1.0, dtype=tf.float64))
b1 = tf.Variable(tf.constant(0.1, shape=[4,10], dtype = tf.float64))
w2 = tf.Variable(tf.truncated_normal([10, 1], mean=0.0, stddev=1.0, dtype=tf.float64))
b2 = tf.Variable(tf.constant(0.1, shape=[1], dtype = tf.float64))
def calculations(x, y):
w1d = tf.matmul(x, w1)
h1 = (tf.nn.sigmoid(tf.add(w1d, b1)))
h1w2 = tf.matmul(h1, w2)
activation = tf.add(tf.nn.sigmoid(tf.matmul(h1, w2)), b2)
error = tf.reduce_sum(tf.pow(activation - y,2))/(len(x))
return [ activation, error ]
My initial thoughts were that it should be the same size as my weights but I get this error:
ValueError: Dimensions must be equal, but are 251 and 4 for 'Add' (op: 'Add') with input shapes: [251,10], [4,10]
I've played around with different ideas but don't seem to be getting anywhere.
(My input data has 4 features)
The network structure I have attempted is 4 neurons in the input layer, 10 in the hidden layer, and 1 in the output later but I feel like I may mixed up the dimensions in my weights layer too?
When you are constructing the layers for a feed-forward fully-connected neural network (like in your example), the shape of the biases should be equal to the number of nodes in the corresponding layer. So in your case, since your weight matrix has a shape of (4, 10), you have 10 nodes in that layer and you should be using:
b1 = tf.Variable(tf.constant(0.1, shape=[10], type = tf.float64))
The reason for this is when you do w1d = tf.matmul(x, w1), you are actually getting a matrix of shape (batch_size, 10) (if batch_size is the number of rows in your input matrix). This is because you are matrix multiplying a (batch_size, 4) matrix by a (4, 10) weight matrix. Then, you are adding a bias across each column of w1d, which can be represented as a 10-dimensional vector, which you would get if you made the shape of b1 [10].
Without the non-linearity (sigmoid) afterward, this is called an affine transformation, which you can read more about here: https://en.wikipedia.org/wiki/Affine_transformation.
Another fantastic resource is the Stanford Deep Learning Tutorial, which has a good explanation of how these feed-forward models work here:
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
Hope that helped!
I think your b1 should just be of dimention 10 and your code should run
Since 4 is the number of features and 10 is the number of neurones in your first layer (i think in term of neural net ...)
then you must add a bias of dimention = 10
Also you might see the biases as adding an extra feature of constant value = 1.
see this pdf if you have time it expalin very well :https://cs.stanford.edu/~quocle/tutorial1.pdf

Tensorflow embedding_lookup

I am trying to learn the word representation of the imdb dataset "from scratch" through the TensorFlow tf.nn.embedding_lookup() function. If I understand it correctly, I have to set up an embedding layer before the other hidden layer, and then when I perform gradient descent, the layer will "learn" a word representation in the weights of this layer. However, when I try to do this, I get a shape error between my embedding layer and the first fully-connected layer of my network.
def multilayer_perceptron(_X, _weights, _biases):
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W")
embedding_layer = tf.nn.embedding_lookup(W, _X)
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(embedding_layer, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
return tf.matmul(layer_2, weights['out']) + biases['out']
x = tf.placeholder(tf.int32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
pred = multilayer_perceptron(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(cost)
init = tf.initialize_all_variables()
The error I get is:
ValueError: Shapes TensorShape([Dimension(None), Dimension(300), Dimension(128)])
and TensorShape([Dimension(None), Dimension(None)]) must have the same rank
The shape error arises because you are using a two-dimensional tensor, x to index into a two-dimensional embedding tensor W. Think of tf.nn.embedding_lookup() (and its close cousin tf.gather()) as taking each integer value i in x and replacing it with the row W[i, :]. From the error message, one can infer that n_input = 300 and embedding_size = 128. In general, the result of tf.nn.embedding_lookup() number of dimensions equal to rank(x) + rank(W) - 1… in this case, 3. The error arises when you try to multiply this result by _weights['h1'], which is a (two-dimensional) matrix.
To fix this code, it depends on what you're trying to do, and why you are passing in a matrix of inputs to the embedding. One common thing to do is to aggregate the embedding vectors for each input example into a single row per example using an operation like tf.reduce_sum(). For example, you might do the following:
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0) ,name="W")
embedding_layer = tf.nn.embedding_lookup(W, _X)
# Reduce along dimension 1 (`n_input`) to get a single vector (row)
# per input example.
embedding_aggregated = tf.reduce_sum(embedding_layer, [1])
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(
embedding_aggregated, _weights['h1']), _biases['b1']))
One another possible solution is : Instead of adding the embedding vectors, concatenate these vectors into a single vector and increase the number of neurons in the hidden layer.
I used :
embedding_aggregated = tf.reshape(embedding_layer, [-1, embedding_size * sequence_length])
Also, i changed the number of neurons in hidden layer to embedding_size * sequence_length.
Observation : Accuracy also improved on using concatenation rather than addition.

Categories

Resources