How to normalize all axis with batch normalization? - python

As far as I understand, for tf.layers.batch_normalization the axis I define is the axis that gets normalized.
Simply put:
Given these values
a = [[0, 2],
[1, 4]]
with shape (2, 2) and therefore axis 0 and 1.
Normalizing over axis 1 would mean to reduce axis 0 to its mean and standard deviation and then take these values for the normalization.
Therefore
bn = tf.layers.batch_normalization(a, axis=[1])
would have (nearly) the same result as
m, v = tf.nn.moments(a, axes=[0])
bn = (a - m) / tf.sqrt(v)
But how would I do tf.layers.batch_normalization for all axis?
With the mean and standard deviation calculation from before this would be easy:
m, v = tf.nn.moments(a, axes=[0, 1])
bn = (a - m) / tf.sqrt(v)
But how to do this with batch normalization?
bn = tf.layers.batch_normalization(a, axis=[???])
I tried the following that doesn't work:
axis = None: AttributeError: 'BatchNormalization' object has no attribute 'axis'
axis = []: IndexError: list index out of range
axis = [0, 1]: All results are zero

Unfortunately I don't think this is feasable using the batch_normalization layers/function of the tensorflow API.
As the name of the function suggests, it's intended to perform "batch" normalization, so it's expected to normalize over the features axis given the current batch (usually dimension 0).

This can be achieved with layer normalization:
>>> data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32)
layer = tf.keras.layers.LayerNormalization(axis=[0, 1])
output = layer(data)
print(output)
tf.Tensor(
[[-1.5666981 -1.2185429 ]
[-0.8703878 -0.5222327 ]
[-0.17407757 0.17407756]
[ 0.52223265 0.8703878 ]
[ 1.2185429 1.5666981 ]], shape=(5, 2), dtype=float32)
The difference with batch normalization is that layer normalization applies the operation to each unit within a batch separately.
If you want to do this operation over a batch, go for batch norm though. Similarly this works by setting axis as a list.

Related

Layer normalization in pytorch

I'm trying to test layer normalization function of PyTorch.
But I don't know why b[0] and result have different values here
Did I do something wrong ?
import numpy as np
import torch
import torch.nn as nn
a = torch.randn(1, 5)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
Result:
input: a[0] = tensor([-1.3549, 0.3857, 0.1110, -0.8456, 0.1486])
output: b[0] = tensor([-1.5561, 1.0386, 0.6291, -0.7967, 0.6851])
mean = torch.mean(a[0])
var = torch.var(a[0])
result = (a[0]-mean)/(torch.sqrt(var+1e-5))
Result:
result = tensor([-1.3918, 0.9289, 0.5627, -0.7126, 0.6128])
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0] (or [-1.0, 1.0]) . I can't understand why. Please let me know if you have any hints
a = torch.randn(1, 2)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
Result:
b = tensor([-1.0000, 1.0000])
For calculating the variance use torch.var(a[0], unbiased=False). Then you will get the same result. By default pytorch calculates the unbiased estimation of the variance.
For your 1st question, as #Theodor said, you need to use unbiased=False unbiased when calculating variance.
Only if you want to explore more: As your input size is 5, unbiased estimation of variance will be 5/4 = 1.25 times the biased estimation. Because unbiased estimation uses N-1 instead of N in the denominator. As a result, each value of result that you generated, is sqrt(4/5) = 0.8944 times the values of b[0].
About your 2nd question:
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0]
This is reasonable. Suppose only two elements are a and b. So, mean will be (a+b)/2 and variance ((a-b)^2)/4. So, the normalization result will be [((a-b)/2) / (sqrt(variance)) ((b-a)/2) / (sqrt(variance))] which is essentially [1, -1] or [-1, 1] depending on a > b or a < b.

2d convolution gives not the desired output

I want to use the 2D convolution in the same way I did here in 1D. Unfortunately the output in the former case does not have the desired shape. Let n = 5, then
h_0 = (1 / 4) * np.array([1, 2, 1])
x = np.random.rand(n)
np.convolve(h_0, x, 'same')
>>> array([0.65498075, 0.72729356, 0.51417706, 0.34597679, 0.1793755])
but
h_00 = np.kron(h_0, h_0)
h_00 = np.reshape(h_00, (3, 3))
x = np.random.rand(n, n)
scipy.signal.convolve2d(h_00, x, 'same', boundary='symm')
>>> array([[1.90147294, 1.6541233 , 1.82704077],
[1.55228912, 1.3641027 , 1.55536069],
[1.61190909, 1.45159935, 1.58266083]])
I would have expected a (5, 5) output array.
The docs for scipy.signal.convolve2d regarding the mode parameter clearly state
mode
...
same
    The output is the same size as in1, centered with respect to the ‘full’ output.
So, given that you pass the kernel first, your output will be the same size as the kernel, not the array you are filtering. To fix, swap the first two inputs:
scipy.signal.convolve2d(x, h_00, 'same', boundary='symm')
Confusion likely arises from the behavior of numpy.convolve, which does the following:
mode : {‘full’, ‘valid’, ‘same’}, optional
...
‘same’:
    Mode ‘same’ returns output of length max(M, N). Boundary effects are still visible.
Numpy interprets the larger array as the kernel regardless of argument order. This is possible because with a single dimension, there is always an unambiguous winner.

Scaling set of rows in a tensor by constant factor

TL;DR How to scale part of tensor by 2 (row-indices present in a tf list)
Details:
indices_of_scaling_ids: Stores list of row_ids
Tensor("Squeeze:0", dtype=int64, device=/device:GPU:0)
[1, 4, 5, 6, 12]
emb_inputs = tf.nn.embedding_lookup(embedding, self.all_rows)
#tensor with shape (batch_size=4, all_row_len, emb_size=128)
So, for every self.all_rows, the emb_inputs is evaluated.
Question / Challenge faced: I need to scale the emb_inputs by 2.0 for every row_ids mentioned in indices_of_scaling_ids.
I have tried various splicing things, but can't seem to get to a nice solution. Can someone suggest? Thanks
N.B. Beginner at Tensorflow
Try with something like this:
SCALE = 2
emb_inputs = ...
indices_of_scaling_ids = ...
emb_shape = tf.shape(emb_inputs)
# Select indices in boolean array
r = tf.range(emb_shape[1])
mask = tf.reduce_any(tf.equal(r[:, tf.newaxis], indices_of_scaling_ids), axis=1)
# Tile the mask
mask = tf.tile(mask[tf.newaxis, :, tf.newaxis], (emb_shape[0], 1, emb_shape[2]))
# Choose scaled or not depending on indices
result = tf.where(mask, SCALE * emb_inputs, emb_inputs)

Reducing two tensors in Tensorflow

I have two tensors.
A tensor of shape (1,N)
A tensor of shape (N,T)
What I want to calculate is the following scalar:
tf.reduce_sum seemed helpful, but I couldn't get my head around combining the two tensors and reduce functions to get what I want. Can someone help me how to write the above equation in tensorflow?
Does this work?
import tensorflow as tf
import numpy as np
N = 10
T = 20
l = tf.constant(np.random.randn(1, N), dtype=tf.float32)
z = tf.constant(np.random.randn(N, T), dtype=tf.float32)
with tf.Session() as sess:
# swap axis for broadcasting to work
l = tf.transpose(l, [1, 0])
z_div_l = tf.divide(z, l)
z_div_l_2 = tf.divide(1.0 - z, 1.0 - l)
result = tf.reduce_sum(tf.add(z_div_l, z_div_l_2), axis=0)
eval_result = sess.run(result)
print('{}\n{}'.format(eval_result.shape, eval_result))
This calculates the above expression for every t from 0 to T-1, so it is not a scalar but a vector of size (T,). Your question mentions you want to compute just one scalar, but the sum is only over N and not over T, so I assumed you just want this expression to be evaluated for every t.

TensorFlow: Max of a tensor along an axis

My question is in two connected parts:
How do I calculate the max along a certain axis of a tensor? For example, if I have
x = tf.constant([[1,220,55],[4,3,-1]])
I want something like
x_max = tf.max(x, axis=1)
print sess.run(x_max)
output: [220,4]
I know there is a tf.argmax and a tf.maximum, but neither give the maximum value along an axis of a single tensor. For now I have a workaround:
x_max = tf.slice(x, begin=[0,0], size=[-1,1])
for a in range(1,2):
x_max = tf.maximum(x_max , tf.slice(x, begin=[0,a], size=[-1,1]))
But it looks less than optimal. Is there a better way to do this?
Given the indices of an argmax of a tensor, how do I index into another tensor using those indices? Using the example of x above, how do I do something like the following:
ind_max = tf.argmax(x, dimension=1) #output is [1,0]
y = tf.constant([[1,2,3], [6,5,4])
y_ = y[:, ind_max] #y_ should be [2,6]
I know slicing, like the last line, does not exist in TensorFlow yet (#206).
My question is: what is the best workaround for my specific case (maybe using other methods like gather, select, etc.)?
Additional information: I know x and y are going to be two dimensional tensors only!
The tf.reduce_max() operator provides exactly this functionality. By default it computes the global maximum of the given tensor, but you can specify a list of reduction_indices, which has the same meaning as axis in NumPy. To complete your example:
x = tf.constant([[1, 220, 55], [4, 3, -1]])
x_max = tf.reduce_max(x, reduction_indices=[1])
print sess.run(x_max) # ==> "array([220, 4], dtype=int32)"
If you compute the argmax using tf.argmax(), you could obtain the the values from a different tensor y by flattening y using tf.reshape(), converting the argmax indices into vector indices as follows, and using tf.gather() to extract the appropriate values:
ind_max = tf.argmax(x, dimension=1)
y = tf.constant([[1, 2, 3], [6, 5, 4]])
flat_y = tf.reshape(y, [-1]) # Reshape to a vector.
# N.B. Handles 2-D case only.
flat_ind_max = ind_max + tf.cast(tf.range(tf.shape(y)[0]) * tf.shape(y)[1], tf.int64)
y_ = tf.gather(flat_y, flat_ind_max)
print sess.run(y_) # ==> "array([2, 6], dtype=int32)"
As of TensorFlow 1.10.0-dev20180626, tf.reduce_max accepts axis and keepdims keyword arguments offering the similar functionality of numpy.max.
In [55]: x = tf.constant([[1,220,55],[4,3,-1]])
In [56]: tf.reduce_max(x, axis=1).eval()
Out[56]: array([220, 4], dtype=int32)
To have a resultant tensor of the same dimension as the input tensor, use keepdims=True
In [57]: tf.reduce_max(x, axis=1, keepdims=True).eval()Out[57]:
array([[220],
[ 4]], dtype=int32)
If the axis argument is not explicitly specified then the tensor level maximum element is returned (i.e. all axes are reduced).
In [58]: tf.reduce_max(x).eval()
Out[58]: 220

Categories

Resources