Tensorflow: Using Adam optimizer - python

I am experimenting with some simple models in tensorflow, including one that looks very similar to the first MNIST for ML Beginners example, but with a somewhat larger dimensionality. I am able to use the gradient descent optimizer with no problems, getting good enough convergence. When I try to use the ADAM optimizer, I get errors like this:
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value Variable_21/Adam
[[Node: Adam_2/update_Variable_21/ApplyAdam = ApplyAdam[T=DT_FLOAT, use_locking=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_21, Variable_21/Adam, Variable_21/Adam_1, beta1_power_2, beta2_power_2, Adam_2/learning_rate, Adam_2/beta1, Adam_2/beta2, Adam_2/epsilon, gradients_11/add_10_grad/tuple/control_dependency_1)]]
where the specific variable that complains about being uninitialized changes depending on the run. What does this error mean? And what does it suggest is wrong? It seems to occur regardless of the learning rate I use.

The AdamOptimizer class creates additional variables, called "slots", to hold values for the "m" and "v" accumulators.
See the source here if you're curious, it's actually quite readable:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/adam.py#L39 . Other optimizers, such as Momentum and Adagrad use slots too.
These variables must be initialized before you can train a model.
The normal way to initialize variables is to call tf.initialize_all_variables() which adds ops to initialize the variables present in the graph when it is called.
(Aside: unlike its name suggests, initialize_all_variables() does not initialize anything, it only add ops that will initialize the variables when run.)
What you must do is call initialize_all_variables() after you have added the optimizer:
...build your model...
# Add the optimizer
train_op = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# Add the ops to initialize variables. These will include
# the optimizer slots added by AdamOptimizer().
init_op = tf.initialize_all_variables()
# launch the graph in a session
sess = tf.Session()
# Actually intialize the variables
sess.run(init_op)
# now train your model
for ...:
sess.run(train_op)

FailedPreconditionError: Attempting to use uninitialized value is one of the most frequent errors related to tensorflow. From official documentation, FailedPreconditionError
This exception is most commonly raised when running an operation that
reads a tf.Variable before it has been initialized.
In your case the error even explains what variable was not initialized: Attempting to use uninitialized value Variable_1. One of the TF tutorials explains a lot about variables, their creation/initialization/saving/loading
Basically to initialize the variable you have 3 options:
initialize all global variables with tf.global_variables_initializer()
initialize variables you care about with tf.variables_initializer(list_of_vars). Notice that you can use this function to mimic global_variable_initializer: tf.variable_initializers(tf.global_variables())
initialize only one variable with var_name.initializer
I almost always use the first approach. Remember you should put it inside a session run. So you will get something like this:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
If your are curious about more information about variables, read this documentation to know how to report_uninitialized_variables and check is_variable_initialized.

You need to call tf.global_variables_initializer() on you session, like
init = tf.global_variables_initializer()
sess.run(init)
Full example is available in this great tutorial
https://www.tensorflow.org/get_started/mnist/mechanics

run init after AdamOptimizer,and without define init before or run init
sess.run(tf.initialize_all_variables())
or
sess.run(tf.global_variables_initializer())

I was having a similar problem. (No problems training with GradientDescent optimizer, but error raised when using to Adam Optimizer, or any other optimizer with its own variables)
Changing to an interactive session solved this problem for me.
sess = tf.Session()
into
sess = tf.InteractiveSession()

Related

Tensorflow graph inside class - how to manage sessions and scopes

I am trying to build a generic tensorflow infrastructure wrapped inside a simple one layer NN class (see code below).
I will be creating many NNets so I was wondering what was the best way to manage the sessions and the variables.
Typically, I'd like to get tf.trainable_variables() for only one network, not all of them (in the "show" function) so that I can print the network I want.
I also have to pass the session variable "sess" to every function, so that the variables are not re-initialized.
I think I am not doing everything properly... Can someone help ?
class oneLayerNN:
"""
Implements a 1 hidden-layer neural network: y = W2 * ([W1 * x + b1]+) + b1
"""
def __init__(self, ...):
...
self.initOp = tf.global_variables_initializer()
def show(self, sess):
tvars = tf.trainable_variables()
tvals = sess.run(tvars)
for var, val in zip(tvars,tvals):
print(var.name, val)
print()
def initializeVariables(self, sess):
sess.run(self.initOp)
def forwardPropagation(self, sess, x):
labels = sess.run(self.yHat, feed_dict={self.x: x})
return labels
def train(self, sess, dataset, epochs, batchSize, debug=False, verbose=False):
dataset = dataset.batch(batchSize)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
for epoch in range(epochs):
sess.run(iterator.initializer)
while True:
try:
batch_x, batch_y = sess.run(next_element)
_, c = sess.run([self.optimizer, self.loss], feed_dict={self.x: batch_x, self.y: batch_y})
except tf.errors.OutOfRangeError:
break
with tf.Session() as sess:
network.initializeVariables(sess)
network.show(sess)
It is probably a matter of taste and of how you intend to use your objects.
If it is OK for you to limit your objects to deal with a single tf.Session (as in Keras — should cover basic needs and probably a bit beyond), then you could simply instantiate a single tf.Session via your preferred Singleton-like pattern (maybe just plain old functions like in Keras).
Thanks for your answers.
However I still have issues with the scopes of variables. How can I do to define variables as part as my object? I want to be able to do something like:
vars = network.getTrainableVariables()
And that should return only the variables defined in that object (not like tf.trainable_variables())
I can't find one example of a clean declaration of variables within a scope when using multiple networks at the same time (the scope being the name of the network for example).
At the moment when I run the code multiple times, it creates variables W,b, then W_1,b_1, then W_2,b_2 etc...
Also, I would like network.initialize() to initialize only the variables defined within this graph, not all variables in every network...
A solution would be to declare variables for network within scope 'name' and then be able to reset_default_graph within this 'name' scope but I am not able to do that.
I'd suggest using tf.keras.Model to manage state. Take a look at the subclassing section of the tf.keras documentation. There are training examples using Model.fit there, but you can also just call the object directly, and it will collect variables and losses for you in properties (variables, trainable_variables, losses, etc.).
Whatever you do, I'd separate the model definition (anything that manages Variable objects) from the training loop. And when defining the model, Variables should be attributes of the model definition object and created once (not necessarily in __init__, but protected by an if self.attribute is not None: self.attribute = tf.Variable(...)).

I should call sess.run(iterator.initializer) in where?

i want to try a code structure:
train_input_fn_try(): generate data, return feature dict and label; the data will produce from tf.data.Dataset.from_tensor_slices((tf.random_uniform([100,5]),tf.random_uniform([100],maxval=4,dtype=tf.int32)));
in main func: i called tf.estimator.DNNClassifier to get classifier, then called classifier.train(input_fn=lambda :train_input_fn_try(batch_size=3),steps=6) to train.
but, i found that i must called sess.run(iterator.initializer) before iterator.get_next().
I don't know where sess.run(iterator.initializer) should be called without destroying the code structure? in main func or in train_input_fn_try func? and how to do?
Here is an example of code that can't work:
def train_input_fn_try(batch_size=2,epoch=1,shuffle=True):
dataset=tf.data.Dataset.from_tensor_slices((tf.random_uniform([100,5]),tf.random_uniform([100],maxval=4,dtype=tf.int32)))
if shuffle:
dataset=dataset.shuffle(10000)
dataset=dataset.repeat(epoch)
dataset=dataset.batch(batch_size)
iterator=dataset.make_initializable_iterator()
with tf.Session() as sess:
sess.run(iterator.initializer)
text,label=iterator.get_next()
return {"text":text},label
with tf.Session() as sess:
my_feature_columns=[]
my_feature_columns.append(tf.feature_column.numeric_column(key="text",shape=[5]))
clf=tf.estimator.DNNClassifier(feature_columns=my_feature_columns,
hidden_units=[10,10],n_classes=4)
clf.train(input_fn=lambda :train_input_fn_try(batch_size=3),steps=6)
the runtime error is:
FailedPreconditionError (see above for traceback): GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element.
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,5], [?]], output_types=[DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: dnn/head/assert_range/assert_less/Assert/Assert/_106 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_83_dnn/head/assert_range/assert_less/Assert/Assert", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Setting aside #Lescurel's very on-point comment, the problem you have is that you initialize the iterator in a different session than the one you then try to train:
def train_input_fn_try(batch_size=2,epoch=1,shuffle=True):
# [...]
with tf.Session() as sess: # <<<<<<<<<<<<<<<<<<<<<<< This session...
sess.run(iterator.initializer)
text,label=iterator.get_next()
return {"text":text},label
with tf.Session() as sess: # <<<<<<<<<<<<<<<<<<<<<<<<<<< ...is NOT the same as this one!
# [...]
The with tf.Session() as sess: statement creates a new instance of a session and assigns it to sess and that session is cosed once you exit the with statement.
For your code, the best solution is to just use a one-shot iterator, but if you really want to use the initializable one, pass sess as a parameter to train_input_fn_try and remove the with statement inside the function:
def train_input_fn_try(sess,batch_size=2,epoch=1,shuffle=True):
# [...]
sess.run(iterator.initializer)
# [...]
Update: why this still doesn't work (with Estimators)
The way the Estimator framework works is approximately:
Make new graph
call input_fn to set up the input pipeline in the new graph
call model_fn to set up the model in the new graph
make a Session and start a training loopp
When you make the lambda, the sess you pass is not the one that will be used by the estimator, so this won't work for you I'm afraid. I am not aware at the moment of ways to use other types of iterators with Estimators, you might have to stick to one-shot iterators.

TensorFlow FailedPreconditionError when using variables from the tf.metric module [duplicate]

This question already has answers here:
FailedPreconditionError: Attempting to use uninitialized in Tensorflow
(11 answers)
Closed 4 years ago.
I tried to add some additional measurements to my training code for a CNN by utilising the functions from the tf.metrics submodule, such as tf.metrics.accuracy(y_labels, y_predicted) and equivalents for precision or recall. This is done in contrast to most of their tutorials where they suggest the convoluted:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Whereas my implementation replaces this line with:
accuracy = tf.metrics.accuracy(y_labels, y_predicted)
Now, even though I do the sess.run(tf.initialize_all_variables()) within my with tf.Session() as sess: block, I still get the following error when trying to use tf.metrics.accuracy function:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value performance/accuracy/count
[[Node: performance/accuracy/count/read = Identity[T=DT_FLOAT, _class=["loc:#performance/accuracy/count"], _device="/job:localhost/replica:0/task:0/cpu:0"](performance/accuracy/count)]]
Most notably, replacing the accuracy = tf.metrics.accuracy(y_labels, y_predicted) line with accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) fixes the problem, however, I would like to implement other metrics such as precision, recall, etc. without doing it by hand.
TL;DR: Add the following line at the beginning of your session:
sess.run(tf.local_variables_initializer())
The confusion arises from the name of the (as frankyjuang points out) deprecated tf.initialize_all_variables() function. This function was deprecated in part because it is misnamed: it doesn't actually initialize all variables, and instead it only initializes global (not local) variables.
According to the documentation for the tf.metrics.accuracy() function (emphasis added):
The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels.
Therefore you need to add an explicit initialization step for the local variables, which can be done using tf.local_variables_initializer(), as suggested above.
sess.run(tf.initialize_all_variables()) is deprecated.
Use sess.run(tf.global_variables_initializer()) instead to resolve your issue.
Reference
According to doc of tf.initialize_all_variables,
THIS FUNCTION IS DEPRECATED. It will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead.

Train only some of the variables in tensorflow

I'm using tensorflow to do a gradient decent classification.
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
here cost is the cost function that I have used in optimization.
After launching the Graph in the Session, the Graph can be fed as:
sess.run(train_op, feed_dict)
And with this, all the variables in the cost function will be updated in order to minimized the cost.
Here is my question. How can I update only some variables in the cost function when training..? Is there a way to convert created variables into constants or something..?
There are several good answers, this subject should already be closed:
stackoverflow
Quora
Just to avoid another click for people getting here :
The minimize function of the tensorflow optimizer takes a var_list argument for that purpose:
first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/first/vars")
first_train_op = optimizer.minimize(cost, var_list=first_train_vars)
second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/second/vars")
second_train_op = optimizer.minimize(cost, var_list=second_train_vars)
I took it as is from mrry
To get the list of the names you should use instead of "scope/prefix/for/second/vars" you can use :
tf.get_default_graph().get_collection_ref(tf.GraphKeys.TRAINABLE_VARIABLES)

Tensorflow: Using weights trained in one model inside another, different model

I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above

Categories

Resources