TensorFlow How to Initialize Global Step - python

So I'm trying to run a training session, and when I do I get this error when trying to run my algorithm (when I use tf.train.get_global_step()):
ValueError: global_step is required for exponential_decay.
For some reason, tf.train.get_or_create_global_step() doesn't exist for me, I'm not sure if that's because it's a removed method or what. I updated TensorFlow and everything I'm up to date.
I've dug around the documentation and there's nothing about it. To run I'm using tf.app.run() with a main function.
Is there another way to initialize the global step variable?

Although tf.train.get_or_create_step() is perfectly fine, here is another solution:
g_step = tf.get_variable('global_step', trainable=False, initializer=0)
learning_rate = tf.train.exponential_decay(0.1, g_step)
tf.train.AdamOptimizer(learning_rate).minimize(loss=loss, global_step=g_step)
Create an untrainable variable that initializes with zero and passes it to the Optimizer.
If you need global_step later use tf.train.global_step():
sess = tf.Session()
# Initialize the variable
sess.run(g_step.initializer)
print('global_step: %s' % tf.train.global_step(sess, g_step))

So, the reason this function wasn't showing up was because I actually hadn't been on the newest version of TensorFlow even though it was telling me I was completely up to date.
Seen Here:
So all I did to fix it was uninstall tensorflow, then install from the actual link I don't have it anymore, but a quick google search would suffice.

Related

AttributeError: module 'tensorflow.contrib.learn.python.learn.ops' has no attribute 'split_squeeze'

I am using lstm predictor for timeseries prediction..
regressor = skflow.Estimator(model_fn=lstm_model(TIMESTEPS, RNN_LAYERS, DENSE_LAYERS))
validation_monitor = learn.monitors.ValidationMonitor(X['val'], y['val'],
every_n_steps=PRINT_STEPS,
early_stopping_rounds=1000)
regressor.fit(X['train'], y['train'], monitors=[validation_monitor])
But while doing regressor.fit, i am getting the error as shown in Title, need help on this..
I understand that your code imports the lstm_model from the file lstm_predictor.py when initializing your estimator. If so, the problem is caused by the following line:
x_ = learn.ops.split_squeeze(1, time_steps, X)
As the README.md of that repo tells, the Tensorflow API has changed significantly. The function split_squeeze also seems to be removed from the module tensorflow.contrib.learn.python.ops. This issue has been discussed in that repository but no changes have been made in that repo since 2 years!
Yet, you can simply replace that function with tf.unstack. So simply change the line as:
x_ = tf.unstack(X, num=time_steps, axis=1)
With this I was able to get past the problem.

In Tensorflow what is GraphKeys.INIT_OP for?

Looking at the documentation for GraphKeys: https://www.tensorflow.org/api_docs/python/tf/GraphKeys
There is a GraphKeys.INIT_OP listed that has no documentation.
What is this collection for exactly?
I'm looking for the best way to add a few necessary assign OPs to the graph such that they will be run once at initialization time only. My initial thought was to add them to GraphKeys.GLOBAL_VARIABLES which are run at the time sess.run(tf.global_variables_initializer()) is run. When I saw GraphKeys.INIT_OP I wondered if it perhaps offered a more robust option?
INIT_OP should contains the global variable initialization op. By default it contains an op that when run, runs these two:
variables.global_variables_initializer()
resources.initialize_resources(resources.shared_resources())
LOCAL_INIT_OP should contains the local variable initialization op. By default it contains an op that when run, runs these three:
variables.local_variables_initializer()
lookup_ops.tables_initializer()
resources.initialize_resources(resources.local_resources())

Why match_filenames_once function returns a local variable

I was trying to understand the mechanism of tensorflow for reading images using queues. I was using the code found here, whom basic parts are:
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once('D:/Dataset/*.jpg'))
image_reader = tf.WholeFileReader()
image_name, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file)
with tf.Session() as sess:
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
image_tensor = sess.run([image])
print(image_tensor)
which in reality does nothing special. I was getting an error:
OutOfRangeError (see above for traceback): FIFOQueue
'_0_input_producer' is closed and has insufficient elements (requested
1, current size 0)
which lead to search for missing images, wrong folder, wrong glob pattern etc until I discovered that tensorflow basically meant this:
"You need to initialize local variables also"!
Besides the fact that the code seemed to work in the original gist with just this substitution:
tf.initialize_all_variables().run()
instead of
tf.global_variables_initializer().run()
in my code it does not work. It produces the same error. I guess it has changed the implementation of initialize_all_variables() with tensorflow development (I am using 1.3.0), since in here it mentions that it initialize local variables also.
So, the final conclusion I came with was that I should initialize local variables also. And my code worked. The error message is awfully misleading (which did not help at all) but anyway to the main part I am a bit confused why am I getting a local variable by match_filenames_once. In documentation there is no reference about this (I am not sure it should though).
Am I always going to get local from this match_filenames_once? Can I control it somehow?

What is the alternative of tf.Variable.ref() in Tensorflow version 0.12?

I'm trying to run open code of A3C reinforcement learning algorithm to learn A3C in A3C code
However,I got several errors and I could fix except one.
In the code, ref() which is a member function of tf.Variable is used (1,2), but in recent tensorflow version 0.12rc, that function seems to be deprecated.
So I don't know what is the best way to replace it (I don't understand exactly why the author used ref()). When I just changed it to the variable itself (for example v.ref() to v), there was no error, but reward is not changed. It seems it cannot learn and I guess it is because the variables are not properly updated.
Please advise me what is the proper way to modify the code to work.
The new method tf.Variable.read_value() is the replacement for tf.Variable.ref() in TensorFlow 0.12 and later.
The use case for this method is slightly tricky to explain, and is motivated by some caching behavior that causes multiple uses of a remote variable on a different device to use a cached value. Let's say you have the following code:
with tf.device("/cpu:0")
v = tf.Variable([[1.]])
with tf.device("/gpu:0")
# The value of `v` will be captured at this point and cached until `m2`
# is computed.
m1 = tf.matmul(v, ...)
with tf.control_dependencies([m1])
# The assign happens (on the GPU) after `m1`, but before `m2` is computed.
assign_op = v.assign([[2.]])
with tf.control_dependencies([assign_op]):
with tf.device("/gpu:0"):
# The initially read value of `v` (i.e. [[1.]]) will be used here,
# even though `m2` is computed after the assign.
m2 = tf.matmul(v, ...)
sess.run(m2)
You can use tf.Variable.read_value() to force TensorFlow to read the variable again later, and it will be subject to whatever control dependencies are in place. So if you wanted to see the result of the assign when computing m2, you'd modify the last block of the program as follows:
with tf.control_dependencies([assign_op]):
with tf.device("/gpu:0"):
# The `read_value()` call will cause TensorFlow to transfer the
# new value of `v` from the CPU to the GPU before computing `m2`.
m2 = tf.matmul(v.read_value(), ...)
(Note that, currently, if all of the ops were on the same device, you wouldn't need to use read_value(), because TensorFlow doesn't make a copy of the variable when it is used as the input to an op on the same device. This can cause a lot of confusion—for example when you enqueue a variable to a queue!—and it's one of the reasons that we're working on enhancing the memory model for variables.)

Tensorflow session returns as 'closed'

I have successfully ported the CIFAR-10 ConvNet tutorial code for my own images and am able to train on my data and generate Tensorboard outputs etc.
My next step was to implement an evaluation of new data against the model I built. I am trying now to use cifar10_eval.py as a starting point however am running into some difficulty.
I should point out that the original tutorial code runs entirely without a problem, including cifar10_eval.py. However, when moving this particular code to my application, I get the following error message (last line).
RuntimeError: Attempted to use a closed Session.
I found this error is thrown by TF's session.py
# Check session.
if self._closed:
raise RuntimeError('Attempted to use a closed Session.')
I have checked the directories in which all files should reside and be created, and all seems exactly as it should (they mirror perfectly those created by running the original tutorial code). They include a train, eval and data folders, containing checkpoints/events files, events file, and data binaries respectively.
I wonder if you could help pointing out how I can debug this, as I'm sure there may be something in the data flow that got disrupted when transitioning the code. Unfortunately, despite digging deep and comparing to the original, I can't find the source, as they are essentially similar with trivial changes in file names and destination directories only.
EDIT_01:
Debugging step by step, it seems the line that actually throws the error is #106 in the original cifar10_eval.py:
def eval_once(args etc)
...
with tf.Session() as sess:
...
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op)) # <========== line 106
summary_op is created in def evaluate of this same script and passed as an arg to def eval_once.
summary_op = tf.merge_all_summaries()
...
while True:
eval_once(saver, summary_writer, top_k_op, summary_op)
From documentation on Session, a session can be closed with .close command or when using it through a context-manager in with block. I did find tensorflow/models/image/cifar10 | xargs grep "sess" and I don't see any sess.close, so it must be the later.
IE, you'll get this error if you do something like this
with tf.Session() as sess:
sess.run(..)
sess.run(...) # Attempted to use a closed Session.
It was a simple (but humbling) error in indentation.
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary.value.add(tag='Precision # 1', simple_value=precision)
summary_writer.add_summary(summary, global_step)
was outside of the try: block, and of course, no session could be found.
Sigh.

Categories

Resources