What does the function control_dependencies do? - python

I would like to have an example illustrating the use of the function tf.control_dependencies. For example, I want to create two tensors X and Y and if they are equal do or print something.
import tensorflow as tf
session = tf.Session()
X = tf.constant(5)
Y = tf.constant(50)
with tf.control_dependencies([tf.assert_equal(X, Y)]):
print('X and Y are equal!')
In the code above, X is clearly not equal to Y. What is tf.control_dependencies doing in this case?

control_dependencies is not a conditional. It is a mechanism to add dependencies to whatever ops you create in the with block. More specifically, what you specify in the argument to control_dependencies is ensured to be evaluated before anything you define in the with block.
In your example, you don't add any (TensorFlow) operations in the with block, so the block does nothing.
This answer has an example of how to use control_dependencies, where it is used to make sure the assignments happen before the batchnorm operations are evaluated.

Related

What is the gradient of pytorch floor() gradient method?

I am looking to use floor() method in one of my models. I would like to understand what pytorch does with its gradient propagation since as such floor is a discontinuous method.
If there is no gradient defined, I could override the backward method to define my own gradient as necessary but I would like to understand what the default behavior is and the corresponding source code if possible.
import torch
x = torch.rand(20, requires_grad=True)
y = 20*x
z = y.floor().sum()
z.backward()
x.grad returns zeros.
z has a grad_fn=
So FloorBackward is the gradient method. But there is no reference to the source code of FloorBackward in pytorch repository.
As the floor function is piece wise constant. This means the gradient must be zero almost everywhere.
While the code doesn't say anything about it, I expect that the gradient is set to a constant zero everywhere.

Why is AdamOptimizer duplicated in my graph?

I am fairly new to the internals of TensorFlow. Towards trying to understand TensorFlow's implementation of AdamOptimizer, I checked the corresponding subgraph in TensorBoard. There seems to be a duplicate subgraph named name + '_1', where name='Adam' by default.
The following MWE produces the graph below. (Note that I have expanded the x node!)
import tensorflow as tf
tf.reset_default_graph()
x = tf.Variable(1.0, name='x')
train_step = tf.train.AdamOptimizer(1e-1, name='MyAdam').minimize(x)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
with tf.summary.FileWriter('./logs/mwe') as writer:
writer.add_graph(sess.graph)
I am confused because I would expect the above code to produce just a single namespace inside the graph. Even after examining the relevant source files (namely adam.py, optimizer.py and training_ops.cc), it's not clear to me how/why/where the duplicate is created.
Question: What is the source of the duplicate AdamOptimizer subgraph?
I can think of the following possibilities:
A bug in my code
Some sort of artifact generated in TensorBoard
This is expected behavior (if so, then why?)
A bug in TensorFlow
Edit: Cleanup and clarification
Due to some initial confusion, I cluttered my original question with detailed instructions for how to set up a reproducible environment with TensorFlow/TensorBoard which reproduces this graph. I have now replaced all that with the clarification about expanding the x node.
This is not a bug, just a perhaps questionable way of leaking outside of your own scope.
First, not a bug: The Adam optimizer is not duplicated. As can be seen in your graph, there is a single /MyAdam scope, not two. No problem here.
However, there are two MyAdam and MyAdam_1 subscopes added to your variable scope. They correspond respectively to the m and v variables (and their initialization operations) of the Adam optimizer for this variable.
This is where choices made by the optimizer are debatable. You could indeed reasonably expect the Adam optimizer operations and variables to be strictly defined within its assigned scope. Instead, they choose to creep in the optimized variables' scope to locate the statistics variables.
So, debatable choice to say the least, but not a bug, in the sense that the Adam optimizer is indeed not duplicated.
EDIT
Note that this way of locating variables is common across optimizers -- you can observe the same effect with a MomentumOptimizer for example. Indeed, this is the standard way of creating slots for optimizers -- see here:
# Scope the slot name in the namespace of the primary variable.
# Set "primary.op.name + '/' + name" as default name, so the scope name of
# optimizer can be shared when reuse is True. Meanwhile when reuse is False
# and the same name has been previously used, the scope name will add '_N'
# as suffix for unique identifications.
So as I understand it, they chose to locate the statistics of a variable within a subscope of the scope of the variable itself, so that if the variable is shared/reused, then its statistics are also shared/reused and do not need to be recomputed. This is indeed a reasonable thing to do, even if again, creeping outside of your scope is somewhat unsettling.

What are the operations allowed in tensorflow loss function definition?

I learned that we need to use tf.OPERATIONS to define the computation graph, but I found sometimes, using + or = are just fine, without using tf.add or tf.assign see here.
My question is that what are the operations allowed in tensorflow loss function definition without using "tf.OPERATIONS". In other words, other than + and = what else? can we use for example *, or ^2 on variables?
PS: I just do not understand why x*x is ok but x^2 is not ...

if operation in tensorflow

I'd like to use if operation in tensorflow, as I've learned tf.cond()should be used here. While I'm confused about what if i'd like to use only if operation instead else operation. For example:
a = tf.Constant(10)
b = tf.Constant(5)
for i in range(5):
tmp = tf.greater_equal(a,b)
result = tf.cond(tmp, lambda:tf.constant(i), None)
if result is not None:
return i
as above, I want to do nothing in else operation, while tf.cond() ask there has to be a value to return.Hope someone can provide some help.
To answer the actual question, if you understand the graph execution model of Tensorflow, you will see that after the conditional, your execution flow must converge again. Meaning that the operations you perform on the result of tf.cond must be technically possible independent of the branch taken in tf.cond. If they are dependent, they should be part of the tf.cond itself.
For the convergence to happen, you have to return same looking tensors from both branches. If you "don't want to do anything" in the else branch, you can just return some zero tensors or some reshaped input from there. What exactly makes sense, depends on your exact use case.

Apply doit() function in SymPy only to the outer part of an expression

the doit() function in sympy goes ahead and evaluates expressions whenever possible. For instance:
from sympy import *
u = IndexedBase('u')
i = symbols('i')
test = Sum(u[i],(i,1,3))
test.doit()
Will return
Also:
from sympy import *
u,x = symbols('u, x')
test = Derivative(u,x)
test.doit()
Creates the symbolic differential and then evaluate it. The evaluation turns out to be zero in this case:
But what if I wanted a Derivative inside of a Sum? The doit() function goes a step too far:
from sympy import *
u = IndexedBase('u')
x = IndexedBase('x')
i = symbols('i')
test = Sum(Derivative(u[i],x[i]),(i,1,3))
test.doit()
This will return again zero. I would like to expand the sum but not actually evaluate the derivative. This should result in an expression as follows:
How can I get this as my output? Is there away to have the doit() command work only on the outer function (Sum()) but not the inner function (Derivative())? Am I doing this wrong?
I found out one way to do it.
Upon careful inspection of the doit() command arguments, it appears that setting option deep = False will prevent the evaluation from going too deep into the expression. Furthermore some indications show it is possible to control this more thoroughly. The command documentation shows:
Evaluate objects that are not evaluate by default like limits,
integrals, sums and products. All objects of this kind will be
evaluated recursively, unless some species were excluded via 'hints'
or unless the 'deep' hint was set to 'False'.
For my part, I am very curious how the 'hints' can be harnessed. If someone can provide additional insight as to how the 'hints' work that would be great. Thanks.

Categories

Resources