What variables does global_variables_initializer() do the initialization? - python

In tensorflow, after I use cell.zero_state() to initialize the cell state and hidden state, I should initialize the global variables or the RNN cell won't run.
However, I wonder how does it globalize(initialize variables range?) and what variables does it globalize(bias? weight? activation function?) ?
enter link description here
I think the parameters that should initialize is non other than: weight, bias, activation function in each neuron.
What does the global_variables_initializer actually do?
Thanks a lot!

Whenever you create a variable in TensorFlow, the framework takes care of adding this variable to a collection of created variables. Think of a list with pointers to variables.
The default collection of such variables is called GraphKeys.GLOBAL_VARIABLES.
The function tf.global_variables_initializer simply retrieves all these variables from the collection and initializes them.
The zero_state is not directly creating a variable. It simply returns "all-zero"-tensor of the matching shape to the cell variables.
The range of the initial variable values depends on the variable-initializers.
TO sum up: Each weight, bias, hidden state variable is collected in a special list of created variables and TensorFlow just initializes each of these variables similar to the pseudo-code:
foreach v in GraphKeys.GLOBAL_VARIABLES:
assign v.value = v.call_initializer()

Related

Why is AdamOptimizer duplicated in my graph?

I am fairly new to the internals of TensorFlow. Towards trying to understand TensorFlow's implementation of AdamOptimizer, I checked the corresponding subgraph in TensorBoard. There seems to be a duplicate subgraph named name + '_1', where name='Adam' by default.
The following MWE produces the graph below. (Note that I have expanded the x node!)
import tensorflow as tf
tf.reset_default_graph()
x = tf.Variable(1.0, name='x')
train_step = tf.train.AdamOptimizer(1e-1, name='MyAdam').minimize(x)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
with tf.summary.FileWriter('./logs/mwe') as writer:
writer.add_graph(sess.graph)
I am confused because I would expect the above code to produce just a single namespace inside the graph. Even after examining the relevant source files (namely adam.py, optimizer.py and training_ops.cc), it's not clear to me how/why/where the duplicate is created.
Question: What is the source of the duplicate AdamOptimizer subgraph?
I can think of the following possibilities:
A bug in my code
Some sort of artifact generated in TensorBoard
This is expected behavior (if so, then why?)
A bug in TensorFlow
Edit: Cleanup and clarification
Due to some initial confusion, I cluttered my original question with detailed instructions for how to set up a reproducible environment with TensorFlow/TensorBoard which reproduces this graph. I have now replaced all that with the clarification about expanding the x node.
This is not a bug, just a perhaps questionable way of leaking outside of your own scope.
First, not a bug: The Adam optimizer is not duplicated. As can be seen in your graph, there is a single /MyAdam scope, not two. No problem here.
However, there are two MyAdam and MyAdam_1 subscopes added to your variable scope. They correspond respectively to the m and v variables (and their initialization operations) of the Adam optimizer for this variable.
This is where choices made by the optimizer are debatable. You could indeed reasonably expect the Adam optimizer operations and variables to be strictly defined within its assigned scope. Instead, they choose to creep in the optimized variables' scope to locate the statistics variables.
So, debatable choice to say the least, but not a bug, in the sense that the Adam optimizer is indeed not duplicated.
EDIT
Note that this way of locating variables is common across optimizers -- you can observe the same effect with a MomentumOptimizer for example. Indeed, this is the standard way of creating slots for optimizers -- see here:
# Scope the slot name in the namespace of the primary variable.
# Set "primary.op.name + '/' + name" as default name, so the scope name of
# optimizer can be shared when reuse is True. Meanwhile when reuse is False
# and the same name has been previously used, the scope name will add '_N'
# as suffix for unique identifications.
So as I understand it, they chose to locate the statistics of a variable within a subscope of the scope of the variable itself, so that if the variable is shared/reused, then its statistics are also shared/reused and do not need to be recomputed. This is indeed a reasonable thing to do, even if again, creeping outside of your scope is somewhat unsettling.

What is "Const:0" in Tensor Object [duplicate]

I wonder if this is the correct understanding:
All tensors are derived from some operation, and operations are either given a name in the constructor, or given the default name for a particular kind of operation. If the name is not unique, TensorFlow automatically handles this by appending "_1", "_2", etc. An operation with n tensor outputs name these tensors "op_name:0", "op_name:1", ..., "op_name:n-1".
One problem seems to arise: if x is a tf.Variable, then x.name gives "variable_name:0". This is confusing: to what does "variable_name" refer?
Your observations on Tensor naming are absolutely correct: the name of a Tensor is the concatenation of
the name of the operation that produced it,
a colon (:), and
the index of that tensor in the outputs of the operation that produced it.
Therefore the tensor named "foo:2" is the output of the op named "foo" at position 2 (with indices starting from zero).
The naming of tf.Variable objects is slightly strange. Every tf.Variable contains a mutable tensor object that holds the state of the variable (and a few other tensors). A "Variable" op (which has the name "variable_name" in your example) "produces" this mutable tensor each time it is run as its 0th output, so the name of the mutable tensor is "variable_name:0".
Since a tf.Variable is mostly indistinguishable from a tf.Tensor—in that it can be used in the same places—we took the decision to make variable names resemble tensor names, so the Variable.name property returns the name of the mutable tensor. (This contrasts with tf.QueueBase and tf.ReaderBase objects, which are not usable directly as tensors (instead you have to call methods on them to create ops that operate on their state), so these do not have a tensor-like name.)

Tensorflow: strange behavior in cifar10 example using custom variable creation method

This is a follow up question to this one.
I'm still working on the cifar10 example on the file cifar10.py and noticed some strange behavior regarding the creation of variables.
But first a side-question: Why are the variables created with weight decay factor of wd=0.0 and not wd=None? That way you would have less vertices in the computation graph.
Next, the strange behavior. I added the following function to make it more convenient to create variables:
def _create_variable(name, shape, initializer, wd=None):
dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
with tf.device('/cpu:0'):
var = tf.get_variable(name, shape, dtype, initializer)
if wd is not None:
wd_val = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
tf.add_to_collection('losses', wd_val)
return var
When using this function to create the variables (with the original parameters), the logits that are computed come from a range of +-1e13 for the first batch, gradually getting better reaching +-1.5. The loss on the other hand starts at around 400000 and gets bigger until it hits NaN.
When using the original functions to create the variables, the logits come from a range of +-1 right from the beginning and the loss start at around 4.5, gradually getting smaller.
Can somebody explain to me what the difference between my and the provided functions for variable generation is, and why the effect is so huge? I don't see it.
The full code of my modified cifar10.py can be found here. To test it out simple replace the original file with my version. To than switch between the original and my function simply change line 212 to CUSTOM = False
Thank you in advance.
Stupid me! I used my own function the wrong way and passed the values for stddev as the mean and used the default stddev of 1.
The curse of not addressing the arguments by their name.
Anyway, why does this cause such a huge loss; sometimes even NaN?

Using a global variable versus calling the function that returns said variable repeatedly

I have a Python module which consists of a number of different functions.
Say my first function returns a variable which I want to use twice in the second and the second returns a variable which I want to use four times in the third function ( . . . and so on).
Is it better to declare the variables that I will want to use throughout the entire module as global and then call the function that returns said variable once to define it globally rather than to call functions more than once in order to use the variables they return?
Am I correct in saying that this is a trade-off between safety (not using global variables) and efficiency (not executing each function more than once if possible)?
def fn_for_reading_file():
# (Insert code for prompting user for filename, opening and reading file)
global file_as_string
# (Insert code for assigning user's file to file_as_string)
return file_as_string
fn_for_reading_file()
def extract_data_from_string():
global my_list = []
# (Insert code for going through return_file_as_string and appending data to my_list)
return my_list
extract_data_from_string()
def another_fn():
# (Insert code which uses file_as_string and my_list)
return fn_output
another_fn()
I would try to reframe the problem you're thinking about. If you're only thinking in terms of functional programming then yes, you're correct in that safety vs. more code is the basic trade off you're looking at.
However, there are a number of ways to get around your dilemma by reframing the problem. I obviously don't know what your code looks like, but it might be meaningful to think about building this functionality into a class. Rather than using global variables, set those values as class attributes with appropriate getters/setters, and then structure the module such that your functions become methods.

How to bind a name with multiple objects or values in python

I saw in a book about language description that says
On the other hand, a name can be bound to no object (a dangling pointer),
one object (the usual case), or several objects (a parameter name in a
recursive function).
How can we bind a name to several objects? Isnt that what we call an array for example where all elements have the same name but with index? For a recursive function like the example here:
x = 0
def f(y):
global x
x += 1
if x < 4 :
y +=100
f(y)
else: return
f(100)
Is the name y binded with multiple values that are created recursively since the nametable has already the y name binded to an initial value which is being reproduced with recursion?
EDITED Just press here Visualizer and see what it generates. :)
No.
A name is bound to one single object . When we are talking about Python - it is either bound to a single object in a given context, or do not exist at all.
What happens, is that the inner workings may have the name defined in several "layers" - but your code will only see one of those.
If a name is a variable in a recursive function, you will only see whatver is bound to it in the current running context - each time there is a function call in Python, the execution frame, which is an object which holds several attributes of the running code, including a reference to the local variables, is frozen. On the called function, a new execuciton frame is created, and there, the variable names are bound again to whatever new values they have in the called context. Your code just "see" this instance.
Then, there is the issue of global variables and builtin objects in Python: if a name is not a local variable in the function execution context, it is searched in the globals variables for the module (again, just one of those will be visible).ANd if the name is not defiend in the globals, them, Python looks for it in globals().__builtins__ that is your last call.
If I understand you correctly, you're asking about what rules Python has for creating variables in different scopes. Python uses lexical scoping on the function level.
It's hard to tell exactly what you're getting at with the code you've written, but, while there may be a different value associated with y in different scopes (with a value of y defined at each level of recursion), your code will only ever be able to see one at a time (the value defined at the scope in which you're operating).
To really understand scoping rules in Python, I would have a look at PEP 227. Also, have a look at this Stack Overflow question.
Finally, to be able to speak intelligently about what a "name" is in Python, I suggest you read about how Python is a "Call-By-Object" language.
At this point, we are capable of understanding that, instead of a "nametable", python uses a dictionary to hold what is accessible in a given scope. See this answer for a little more detail. The implication of this is that you can never have two of the same name in a single scope (for the same reason you can't have two of the same key in a python dictionary). So, while y may exist in a dictionary for a different scope, you have no way of accessing it, since you can only access the variables in the current scope's dictionary.
The key is:
several objects (a parameter name in a recursive function).
The passage is almost certainly not referring to arrays, but simply to the fact that in a recursive function (or any function, but a recursive function is likely to have multiple activations at one time), a parameter may be bound to a different value in each recursive call.
This does not mean that you can access each such object in every stack frame; indeed the point of the technique is to ensure that only one such value is accessible in each stack frame.
Firstly, you should mention in the question that the sentence from the book is not related explicitly to Python (as jsbueno wrote, one name is bound to exactly one object in Python).
Anyway, name bound to no object is a bit inaccurate. Generally, names are related to variables, and name related to a dangling pointer is the name of that pointer variable.
When speaking about the variable scope (i.e. the part of code where the variable is used), one variable name can be used only for a single value at a time. However, there may be other parts of code, independent on the one where we think about that variable. In the other part of code, the same name can be used; however, the two variables with the same name are totally isolated. This is the case of local variables also in the case of function bodies. If the language allows recursion, it must be capable to create another isolated space of local variable even for another call of the same function.
In Python, each function can also access outer variables, but it is more usual to use the inner, local variables. Whenever you assign a name some value, it is created in the local space.

Categories

Resources