What is the difference between np.mean and tf.reduce_mean? - python

In the MNIST beginner tutorial, there is the statement
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
tf.cast basically changes the type of tensor the object is, but what is the difference between tf.reduce_mean and np.mean?
Here is the doc on tf.reduce_mean:
reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None)
input_tensor: The tensor to reduce. Should have numeric type.
reduction_indices: The dimensions to reduce. If None (the defaut), reduces all dimensions.
# 'x' is [[1., 1. ]]
# [2., 2.]]
tf.reduce_mean(x) ==> 1.5
tf.reduce_mean(x, 0) ==> [1.5, 1.5]
tf.reduce_mean(x, 1) ==> [1., 2.]
For a 1D vector, it looks like np.mean == tf.reduce_mean, but I don't understand what's happening in tf.reduce_mean(x, 1) ==> [1., 2.]. tf.reduce_mean(x, 0) ==> [1.5, 1.5] kind of makes sense, since mean of [1, 2] and [1, 2] is [1.5, 1.5], but what's going on with tf.reduce_mean(x, 1)?

The functionality of numpy.mean and tensorflow.reduce_mean are the same. They do the same thing. From the documentation, for numpy and tensorflow, you can see that. Lets look at an example,
c = np.array([[3.,4], [5.,6], [6.,7]])
print(np.mean(c,1))
Mean = tf.reduce_mean(c,1)
with tf.Session() as sess:
result = sess.run(Mean)
print(result)
Output
[ 3.5 5.5 6.5]
[ 3.5 5.5 6.5]
Here you can see that when axis(numpy) or reduction_indices(tensorflow) is 1, it computes mean across (3,4) and (5,6) and (6,7), so 1 defines across which axis the mean is computed. When it is 0, the mean is computed across(3,5,6) and (4,6,7), and so on. I hope you get the idea.
Now what are the differences between them?
You can compute the numpy operation anywhere on python. But in order to do a tensorflow operation, it must be done inside a tensorflow Session. You can read more about it here. So when you need to perform any computation for your tensorflow graph(or structure if you will), it must be done inside a tensorflow Session.
Lets look at another example.
npMean = np.mean(c)
print(npMean+1)
tfMean = tf.reduce_mean(c)
Add = tfMean + 1
with tf.Session() as sess:
result = sess.run(Add)
print(result)
We could increase mean by 1 in numpy as you would naturally, but in order to do it in tensorflow, you need to perform that in Session, without using Session you can't do that. In other words, when you are computing tfMean = tf.reduce_mean(c), tensorflow doesn't compute it then. It only computes that in a Session. But numpy computes that instantly, when you write np.mean().
I hope it makes sense.

The key here is the word reduce, a concept from functional programming, which makes it possible for reduce_mean in TensorFlow to keep a running average of the results of computations from a batch of inputs.
If you are not familiar with functional programming, this can seem mysterious. So first let us see what reduce does. If you were given a list like [1,2,5,4] and were told to compute the mean, that is easy - just pass the whole array to np.mean and you get the mean. However what if you had to compute the mean of a stream of numbers? In that case, you would have to first assemble the array by reading from the stream and then call np.mean on the resulting array - you would have to write some more code.
An alternative is to use the reduce paradigm. As an example, look at how we can use reduce in python to calculate the sum of numbers:
reduce(lambda x,y: x+y, [1,2,5,4]).
It works like this:
Step 1: Read 2 digits from the list - 1,2. Evaluate lambda 1,2. reduce stores the result 3. Note - this is the only step where 2 digits are read off the list
Step 2: Read the next digit from the list - 5. Evaluate lambda 5, 3 (3 being the result from step 1, that reduce stored). reduce stores the result 8.
Step 3: Read the next digit from the list - 4. Evaluate lambda 8,4 (8 being the result of step 2, that reduce stored). reduce stores the result 12
Step 4: Read the next digit from the list - there are none, so return the stored result of 12.
Read more here Functional Programming in Python
To see how this applies to TensorFlow, look at the following block of code, which defines a simple graph, that takes in a float and computes the mean. The input to the graph however is not a single float but an array of floats. The reduce_mean computes the mean value over all those floats.
import tensorflow as tf
inp = tf.placeholder(tf.float32)
mean = tf.reduce_mean(inp)
x = [1,2,3,4,5]
with tf.Session() as sess:
print(mean.eval(feed_dict={inp : x}))
This pattern comes in handy when computing values over batches of images. Look at The Deep MNIST Example where you see code like:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

The new documentation states that tf.reduce_mean() produces the same results as np.mean:
Equivalent to np.mean
It also has absolutely the same parameters as np.mean. But here is an important difference: they produce the same results only on float values:
import tensorflow as tf
import numpy as np
from random import randint
num_dims = 10
rand_dim = randint(0, num_dims - 1)
c = np.random.randint(50, size=tuple([5] * num_dims)).astype(float)
with tf.Session() as sess:
r1 = sess.run(tf.reduce_mean(c, rand_dim))
r2 = np.mean(c, rand_dim)
is_equal = np.array_equal(r1, r2)
print is_equal
if not is_equal:
print r1
print r2
If you will remove type conversion, you will see different results
In additional to this, many other tf.reduce_ functions such as reduce_all, reduce_any, reduce_min, reduce_max, reduce_prod produce the same values as there numpy analogs. Clearly because they are operations, they can be executed only from inside of the session.

Related

Layer normalization in pytorch

I'm trying to test layer normalization function of PyTorch.
But I don't know why b[0] and result have different values here
Did I do something wrong ?
import numpy as np
import torch
import torch.nn as nn
a = torch.randn(1, 5)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
Result:
input: a[0] = tensor([-1.3549, 0.3857, 0.1110, -0.8456, 0.1486])
output: b[0] = tensor([-1.5561, 1.0386, 0.6291, -0.7967, 0.6851])
mean = torch.mean(a[0])
var = torch.var(a[0])
result = (a[0]-mean)/(torch.sqrt(var+1e-5))
Result:
result = tensor([-1.3918, 0.9289, 0.5627, -0.7126, 0.6128])
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0] (or [-1.0, 1.0]) . I can't understand why. Please let me know if you have any hints
a = torch.randn(1, 2)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
Result:
b = tensor([-1.0000, 1.0000])
For calculating the variance use torch.var(a[0], unbiased=False). Then you will get the same result. By default pytorch calculates the unbiased estimation of the variance.
For your 1st question, as #Theodor said, you need to use unbiased=False unbiased when calculating variance.
Only if you want to explore more: As your input size is 5, unbiased estimation of variance will be 5/4 = 1.25 times the biased estimation. Because unbiased estimation uses N-1 instead of N in the denominator. As a result, each value of result that you generated, is sqrt(4/5) = 0.8944 times the values of b[0].
About your 2nd question:
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0]
This is reasonable. Suppose only two elements are a and b. So, mean will be (a+b)/2 and variance ((a-b)^2)/4. So, the normalization result will be [((a-b)/2) / (sqrt(variance)) ((b-a)/2) / (sqrt(variance))] which is essentially [1, -1] or [-1, 1] depending on a > b or a < b.

How to implement tf.argmax on our own?

I want to use a function which takes as input a tensor and Returns the index with the largest value across axes of a tensor. I know there exists a function tf.argmax() that does exactly the same, but how do I implement it on my own (this may be necessary incase of implementing some custom function)?
Let us suppose for now the function takes as input only 1D tensor. So, the function needs to be of following signature:
argmax(
input, #input is a 1D tensor
name=None
)
I tried implementing it this way:
def argmax(input, name=None):
maxValue=0
maxIndex=0
for i in range(input.get_shape()[0]):
if input[i]>maxValue:
maxValue=input[i]
maxIndex=i
return maxIndex
However this does not work since during the construction phase, the values are not yet intialized and hence I cannot compare two values as I did in the above code. So, is there a way where we can write out custom functions like tf.argmax, tf.equal, etc?
Well, one simple way would be this:
idx = tf.where(tf.equal(input, tf.reduce_max(input)))[0, 0]
Example:
import tensorflow as tf
with tf.Session() as sess:
input = tf.constant([1, 3, 4, 2, 1, 2])
idx = tf.where(tf.equal(input, tf.reduce_max(input)))[0, 0]
print(sess.run(idx))
Output:
2

How to concatenate two tensors having different shape with TensorFlow?

Hello I'm new with TensorFlow and I'd like to concatenate a 2D tensor to a 3D one. I don't know how to do it by exploiting TensorFlow functions.
tensor_3d = [[[1,2], [3,4]], [[5,6], [7,8]]] # shape (2, 2, 2)
tensor_2d = [[10,11], [12,13]] # shape (2, 2)
out: [[[1,2,10,11], [3,4,10,11]], [[5,6,12,13], [7,8,12,13]]] # shape (2, 2, 4)
I would make it work by using loops and new numpy arrays, but in that way I wouldn't use TensorFlow transformations. Any suggestions on how to make this possible? I don't see how transformations like: tf.expand_dims or tf.reshape may help here...
Thanks for sharing your knowledge.
This should do the trick:
import tensorflow as tf
a = tf.constant([[[1,2], [3,4]], [[5,6], [7,8]]])
b = tf.constant([[10,11], [12,13]])
c = tf.expand_dims(b, axis=1) # Add dimension
d = tf.tile(c, multiples=[1,2,1]) # Duplicate in this dimension
e = tf.concat([a,d], axis=-1) # Concatenate on innermost dimension
with tf.Session() as sess:
print(e.eval())
Gives:
[[[ 1 2 10 11]
[ 3 4 10 11]]
[[ 5 6 12 13]
[ 7 8 12 13]]]
There is actually a different trick, that is used from time to time in code bases such as OpenAI's baselines.
Suppose you have two tensors for your gaussian policy. mu and std. The standard deviation has the same shape as mu for batch size 1, but because you use the same parameterized standard deviation for all actions, when the batch size is larger than 1 the two would differ:
mu : Size<batch_size, feat_n>
std: Size<1, feat_n>
In this case a simple thing to do (as what the OpenAI baseline does) is to do:
params = tf.concat([mu, mu * 0 + std])
The zero multiplication casts the std into the same rank as mu.
Enjoy, and good luck training!
ps: numpy and tensorflow's concat operator does not automagically apply broadcasting because according to the maintainers, when the shape of two tensors doesn't match, it is usually the result of a programming error. This is not a big deal in numpy because the computation are evaluated eagerly. But with tensorflow this means that you have to explicitly broadcast the lower rank (or the one that has shape [1, *_]) by hand using the tf.shape operator.

Tensorflow: use different expression for forward and backward pass

I have a tensorflow expression where I want to use a different expression depending on whether I'm computing the forward or backward (gradient) pass. Specifically, I want to ignore the effects of some randomness (noise) added into the network during the backwards pass.
Here's a simplified example
import numpy as np
import tensorflow as tf
x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
g = tf.gradients(z, x)[0]
with tf.Session() as sess:
yv, yuv, zv, gv = sess.run([y,yu,z,g], {x: [-2, -1, 1]})
print(yv)
print(yuv)
print(zv)
print(gv)
which outputs something like
[4. 1. 1.]
[4.1626534 0.9370764 1.0806011]
[2.0402582 0.96802706 1.0395197 ]
[-1.0201291 -0.96802706 1.0395197 ]
The last values here are the derivative of z with respect to x. I would like them to not include the multiplicative noise term u, i.e. they should consistently be [-1, -1, 1] for these input values of x.
Is there a way to do such a thing only using Python? I know I can make a custom operator in C and define a custom gradient for it, but I'd like to avoid this if possible.
Also, I'm hoping to use this as part of a Keras layer, so a Keras-based solution would be an alternative (i.e. if one could define a different expression for the forwards and backwards pass through a Keras layer). This does mean that just defining a second expression z2 = tf.sqrt(y) and calling gradients on that isn't a solution for me, though, because I don't know how I would put that in Keras (since in Keras, it will be part of a very long computational graph).
The short answer is that Sergey Ioffe's trick, which you mentioned above, will only work if it's applied at the very end of the graph, right before the gradient computation.
I am assuming that you tried the following, which will not work:
yu_fixed = tf.stop_gradient(yu - y) + y
z = tf.sqrt(yu_fixed)
This still outputs random-tainted gradients.
To see why, let's follow along the gradient computation. Let's use s as shorthand for tf.stop_gradient. The way this works is that when TensorFlow needs to compute s(expr), it just returns expr, but when it needs to compute the gradient of s(expr), it returns 0.
We want to compute the gradient of z = sqrt(s(yu - y) + y). Now, because
,
we find that the gradient of z contains both a term with the derivative of s(), but also a term containing s() itself. This latter term will not zero out the s() portion, so the computed derivative of z will depend (in some odd and incorrect way) on the value yu. This is why the above solution still contains randomness in the gradient.
As far as I can see, the only way to work around this is to apply Ioffe's trick as the last stage before the tf.gradient. In other words, if you do something like the following, you will get the expected result:
x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
z_fixed = tf.stop_gradient(z - tf.sqrt(y)) + tf.sqrt(y)
g = tf.gradients(z_fixed, x)[0]
with tf.Session() as sess:
yv, yuv, zv, gv = sess.run([y,yu,z_fixed,g], {x: [-2, -1, 1]})
print(yv)
print(yuv)
print(zv)
print(gv)
Output:
[ 4. 1. 1.]
[ 3.65438652 1.07519293 0.94398856]
[ 1.91164494 1.03691506 0.97159076]
[-1. -1. 1.]

TensorFlow, when can Python-like negative indexing be used if ever?

I'm new to TensorFlow (version 1.2), but not to Python or Numpy. I am building a model to predict the shape of a protein molecule. I need to wrap TensorFlow's standard tf.losses.cosine_distance function in some extra code, because I need to stop the propagation of some NaN values into the loss calculation.
I know exactly which cells will be NaN. Whatever my machine learning system predicts for those cells does not count. I plan to turn the NaN part of the output of tf.losses.cosine_distance into zeros before summing up the loss function.
Here's a snippet of working code, using tf.scatter_nd_update for the element assignment:
def custom_distance(predict, actual):
with tf.name_scope("CustomDistance"):
loss = tf.losses.cosine_distance(predict, actual, -1,
reduction=tf.losses.Reduction.NONE)
loss = tf.Variable(loss) # individual elements can be modified
indices = tf.constant([[0,0,0],[29,1,0],[29,2,0]])
updates = tf.constant([0., 0., 0.])
loss = tf.scatter_nd_update(loss, indices, updates)
return loss
But, that only works on the one protein that I have that is 30 amino acids long. What if I have a protein of a different length? I will have many.
In Numpy, I would just use Python's negative indexing, and substitute -1's for the two 29's on the indices line. Tensorflow will not accept that. If I make that substitution, I get a long traceback, but I think that the most important part of it is this:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid indices: [0,1] = [-1, 1, 0] is not in [0, 30)
(I could also modify the predict Tensor so that the cells in question exactly match the actual Tensor before calculating the loss, but in each case the challenge is the same: to assign the values of individual elements in a TensorFlow object.)
Should I just forget about negative indexing in TensorFlow? I am poring through the TensorFlow docs to understand the correct approach to this problem. I assume that I can retrieve the length of my input Tensors long the primary axis and use that. But after seeing the strong parallels between TensorFlow and Numpy, I have to wonder whether that's clunky.
Thanks for your suggestions.
It can be used with tensorflow's bindings to python slicing operators. So for example, loss[-1] is a valid slicing of loss.
In your case, if you have only three slices, you could assign them individually:
update_op0 = indices[0,0,0].assign(updates[0])
update_op1 = indices[-1,1,0].assign(updates[1])
update_op2 = indices[-1,2,0].assign(updates[2])
If you have more slices than that, or a variable number of slices, the previous approach is not practical. You can rather write a small helper function like this to convert "positive or negative indices" to "positive only indices":
def to_pos_idx(idx, x):
# todo: shape & bound checking
idx = tf.convert_to_tensor(idx)
s = tf.shape(x)[:tf.size(idx)]
idx = tf.where(idx < 0, s + idx, idx)
return idx
and modify your code like this :
indices = tf.constant([[0,0,0],[-1,1,0],[-1,2,0]])
indices = tf.map_fn(lambda i: to_pos_idx(i, loss), indices) # transform indices here
updates = tf.constant([0., 0., 0.])
loss = tf.scatter_nd_update(loss, indices, updates)

Categories

Resources