TensorFlow 2.0 tf.keras API Eager mode vs. Graph mode - python

In TensorFlow <2 the training function for a DDPG actor could be concisely implemented using tf.keras.backend.function as follows:
critic_output = self.critic([self.actor(state_input), state_input])
actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights,
loss=-tf.keras.backend.mean(critic_output))
self.actor_train_on_batch = tf.keras.backend.function(inputs=[state_input],
outputs=[self.actor(state_input)],
updates=actor_updates)
Then during each training step calling self.actor_train_on_batch([np.array(state_batch)]) would compute the gradients and perform the updates.
However running that on TF 2.0 gives the following error due to eager mode being on by default:
actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights, loss=-tf.keras.backend.mean(critic_output))
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py", line 448, in get_updates
grads = self.get_gradients(loss, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py", line 361, in get_gradients
grads = gradients.gradients(loss, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 547, in _GradientsHelper
raise RuntimeError("tf.gradients is not supported when eager execution "
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
As expected, disabling eager execution via tf.compat.v1.disable_eager_execution() fixes the issue.
However I don't want to disable eager execution for everything - I would like to use purely the 2.0 API.
The exception suggests using tf.GradientTape instead of tf.gradients but that's an internal call.
Question: What is the appropriate way of computing -tf.keras.backend.mean(critic_output) in graph mode (in TensorFlow 2.0)?

As far as I understood, your critic_output is just a TensorFlow tensor, so you can just use tf.math.reduce_mean operation. And it'll work in a TensorFlow session, not in imperative style. I.e. this will return an operation to be evaluated in a TensorFlow session.
import tensorflow as tf
import numpy as np
inp = tf.placeholder(dtype=tf.float32)
mean_op = tf.math.reduce_mean(inp)
with tf.Session() as sess:
print(sess.run(mean_op, feed_dict={inp: np.ones(10)}))
print(sess.run(mean_op, feed_dict={inp: np.random.randn(10)}))
It'll evaluate in something like:
1.0
-0.002577734

So, first of all you error is related to the fact that optimizer.get_updates() is designed for graph mode as it does include the K.gradients() needed to get the gradients tensors and then apply the Keras optimizer-based update to the trainable variables of the model using the K.function.
Secondly, in terms of eager-mode-or-not soundness the cost function loss=-tf.keras.backend.mean(critic_output) has no flows.
What you should to is get rid of your graph mode code and stick to the native 2.0 eager mode. Based on your code the training should look like:
def train_method(self, state_input):
with tf.GradientTape() as tape:
critic_output = self.critic([self.actor(state_input), state_input])
loss=-tf.keras.backend.mean(critic_output)
grads = tape.gradient(loss, params=self.actor.trainable_variables)
# now please note that self.optimizer_actor must have apply_gradients
# so it should be tf.train.OptimizerName...
self.optimizer_actor.apply_gradients(zip(grads, self.actor.trainable_variables))

Related

tf.estimator input_fn and eager mode

I tried to use numpy inside cnn_model.evaluate(), but it gave AttributeError: 'Tensor' object has no attribute 'numpy'. I used numpy to calculate accuracy and mean squared error using tf.keras.metrics.Accuracy() and tf.keras.metrics.MeanSquaredError() inside cnn_model.evaluate()
I googled it, and in tensorflow documentation, it said
"Calling methods of Estimator will work while eager execution is enabled. However, the model_fn and input_fn is not executed eagerly, Estimator will switch to graph mode before calling all user-provided functions (incl. hooks), so their code has to be compatible with graph mode execution."
So, I was wondering how I can update the current tf 1.x code to tf 2.1.0 code, while also using above information.
My current code is:
eval_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
x={"x": np.array(train_inputs, dtype=np.float32)},
y=np.array(train_labels, dtype=np.float32),
#y=np.array(train_labels),
batch_size=1,
num_epochs=1,
shuffle=False)
eval_results = CNN.evaluate(input_fn=eval_input_fn)
What I have tried so far is add tf.compat.v1.enable_eager_execution() to the 1) beginning of the code after all the imports, 2) next line right after importing tf, 3) line right before declaring eval_input_fn, 4) line right before calling eval_results, 5) inside CNN model definition. It all failed to turn on the eager mode.
One other option that I found was remove #tf.function decorator, but I have no idea what that means and how to pass input_fn if #tf.function is removed.

How to write to TensorBoard in TensorFlow 2

I'm quite familiar in TensorFlow 1.x and I'm considering to switch to TensorFlow 2 for an upcoming project. I'm having some trouble understanding how to write scalars to TensorBoard logs with eager execution, using a custom training loop.
Problem description
In tf1 you would create some summary ops (one op for each thing you would want to store), which you would then merge into a single op, run that merged op inside a session and then write this to a file using a FileWriter object. Assuming sess is our tf.Session(), an example of how this worked can be seen below:
# While defining our computation graph, define summary ops:
# ... some ops ...
tf.summary.scalar('scalar_1', scalar_1)
# ... some more ops ...
tf.summary.scalar('scalar_2', scalar_2)
# ... etc.
# Merge all these summaries into a single op:
merged = tf.summary.merge_all()
# Define a FileWriter (i.e. an object that writes summaries to files):
writer = tf.summary.FileWriter(log_dir, sess.graph)
# Inside the training loop run the op and write the results to a file:
for i in range(num_iters):
summary, ... = sess.run([merged, ...], ...)
writer.add_summary(summary, i)
The problem is that sessions don't exist anymore in tf2 and I would prefer not disabling eager execution to make this work. The official documentation is written for tf1 and all references I can find suggest using the Tensorboard keras callback. However, as far as I know, this only works if you train the model through model.fit(...) and not through a custom training loop.
What I've tried
The tf1 version of tf.summary functions, outside of a session. Obviously any combination of these functions fails, as FileWriters, merge_ops, etc. don't even exist in tf2.
This medium post states that there has been a "cleanup" in some tensorflow APIs including tf.summary(). They suggest using from tensorflow.python.ops.summary_ops_v2, which doesn't seem to work. This implies using a record_summaries_every_n_global_steps; more on this later.
A series of other posts 1, 2, 3, suggest using the tf.contrib.summary and tf.contrib.FileWriter. However, tf.contrib has been removed from the core TensorFlow repository and build process.
A TensorFlow v2 showcase from the official repo, which again uses the tf.contrib summaries along with the record_summaries_every_n_global_steps mentioned previously. I couldn't make this to work either (even without using the contrib library).
tl;dr
My questions are:
Is there a way to properly use tf.summary in TensroFlow 2?
If not, is there another way to write TensorBoard logs in TensorFlow 2, when using a custom training loop (not model.fit())?
Yes, there is a simpler and more elegant way to use summaries in TensorFlow v2.
First, create a file writer that stores the logs (e.g. in a directory named log_dir):
writer = tf.summary.create_file_writer(log_dir)
Anywhere you want to write something to the log file (e.g. a scalar) use your good old tf.summary.scalar inside a context created by the writer. Suppose you want to store the value of scalar_1 for step i:
with writer.as_default():
tf.summary.scalar('scalar_1', scalar_1, step=i)
You can open as many of these contexts as you like inside or outside of your training loop.
Example:
# create the file writer object
writer = tf.summary.create_file_writer(log_dir)
for i, (x, y) in enumerate(train_set):
with tf.GradientTape() as tape:
y_ = model(x)
loss = loss_func(y, y_)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# write the loss value
with writer.as_default():
tf.summary.scalar('training loss', loss, step=i+1)

Tensorflow control_dependencies not forcing specified operator to be ran first

Today I noticed some strange behaviour in Tensorflow and thought I'd ask here to understand what's happening. My problem revolves around tf.control_dependencies not making the specified operator being run before the operators I define inside the with block. What I am asking here is not how to compute the performance metrics (I coded that manually), but rather where my misconception lies.
So, to set the scene. Today, I made some code to log performance metrics during training of a CNN, and I was using the tensorflow.metrics module for this. However, the operators in this module cumulate the previous results (so performance metrics can be computed for very large datasets). I want to log how the metrics evolve over time as the network train, so I don't want this behaviour. Therefore, I wrapped the creation of these performance metrics nodes in a tf.control_dependencies, forcing (or so I thought) a tf.local_variables_initialiser to be evaluated before my performance metrics is computed. Thus, my code could look like this
import tensorflow as tf
import numpy as np
labels = tf.convert_to_tensor(np.arange(10))
out = tf.convert_to_tensor(np.random.randn(10, 1))
with tf.control_dependencies([tf.local_variables_initializer()]):
_, precision = tf.metrics.precision(labels, out)
with tf.Session() as sess:
#sess.run(tf.local_variables_initializer())
print(sess.run(precision))
but when I try to run the above code, I get the following error
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value precision_4/true_positives/count
[[Node: precision_4/true_positives/AssignAdd = AssignAdd[T=DT_FLOAT, _class=["loc:#precision_4/true_positives/count"], use_locking=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](precision_4/true_positives/count, precision_4/true_positives/Sum)]]
now, I have encountered this error many times while I tried to understand the metrics module and the reason for it is that I have not initialised my variables properly. Therefore, I tested this code
import tensorflow as tf
import numpy as np
labels = tf.convert_to_tensor(np.arange(10))
out = tf.convert_to_tensor(np.random.randn(10, 1))
with tf.control_dependencies([tf.local_variables_initializer()]):
_, precision = tf.metrics.precision(labels, out)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print(sess.run(precision))
which does indeed work.
So my question remains. Why is the tf.local_variables_initializer() node not ran before the performance metrics are computed in my first code example?
This is indeed really strange. I guess you need to place
_, precision = tf.metrics.precision(labels, out)
before the control_dependencies like
import tensorflow as tf
import numpy as np
labels = tf.convert_to_tensor(np.arange(10))
out = tf.convert_to_tensor(np.random.randn(10, 1))
_, _precision = tf.metrics.precision(labels, out)
with tf.control_dependencies([tf.local_variables_initializer()]):
# precision = tf.identity(_precision)
precision = 1 * _precision
with tf.Session() as sess:
print(sess.run(precision))
This works like expected as the local variables of tf.metrics.precision do exist before calling tf.local_variables_initializer. In your code, the tf.local_variables_initializer is executed before the node precision. Hence, precision_4/true_positives/count cannot exists and therefore not initialized, simply because the graph is not existing.
To make it even more strange (which seems to be a bug):
Placing precision = 1 * precision in the body of control_dependencies works. But precision = tf.identity(precision) does not.
This is a good candidate for a bug in TensorFlow.

TensorFlow: TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed

I'm trying to define a triplet loss using descriptor from a CNN's output, but this error showed up when I try to train the network.
My definition of loss function:
def compute_loss(descriptor, margin):
diff_pos = descriptor[0:1800:3] - descriptor[1:1800:3]
diff_neg = descriptor[0:1800:3] - descriptor[2:1800:3]
Ltriplet = np.maximum(0, 1 - tf.square(diff_neg)/(tf.square(diff_pos) + margin))
Lpair = tf.square(diff_pos)
Loss = Ltriplet + Lpair
return Loss
here descriptor is the outcome of CNN, the income of CNN is a set of triplets containing anchor, puller and pusher exactly in this order. As input I packed 600 triplet together and feed them into the CNN.
Then I got this error when training the network:
2018-03-08 16:40:49.529263: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "/Users/gaoyingqiang/Documents/GitHub/Master-TUM/TDCV/exercise_3/ex3/task2_new.py", line 78, in <module>
loss = compute_loss(h_fc2, margin)
File "/Users/gaoyingqiang/Documents/GitHub/Master-TUM/TDCV/exercise_3/ex3/task2_new.py", line 37, in compute_loss
Ltriplet = np.maximum(0, 1 - tf.square(diff_neg)/(tf.square(diff_pos) + margin))
File "/Users/gaoyingqiang/.virtualenvs/ex3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 614, in __bool__
raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Process finished with exit code 1
Where goes wrong?
You are mixing numpy and tensorflow operations. Tensorflow accepts numpy arrays normally (their value is known statically, hence can be converted to a constant), but not vice versa (tensor value is known only when the session is run, except eager evaluation).
The solution: change np.maximum to tf.maximum.

Tensorflow - Using tf.summary with 1.2 Estimator API

I'm trying to add some TensorBoard logging to a model which uses the new tf.estimator API.
I have a hook set up like so:
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
summary_op=tf.summary.merge_all())
# ...
classifier.train(
input_fn,
steps=1000,
hooks=[summary_hook])
In my model_fn, I am also creating a summary -
def model_fn(features, labels, mode):
# ... model stuff, calculate the value of loss
tf.summary.scalar("loss", loss)
# ...
However, when I run this code, I get the following error from the summary_hook:
Exactly one of scaffold or summary_op must be provided. This is probably because tf.summary.merge_all() is not finding any summaries and is returning None, despite the tf.summary.scalar I declared in the model_fn.
Any ideas why this wouldn't be working?
Use tf.train.Scaffold() and pass tf.merge_all as following
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
scaffold=tf.train.Scaffold(summary_op=tf.summary.merge_all()))
Just for whoever have this question in the future, the selected solution doesn't work for me (see my comments in the selected solution).
Actually, with TF 1.2 Estimator API, one doesn't need to have summary_hook. I just have tf.summary.scalar("loss", loss) in the model_fn, and run the code without summary_hook. The loss is recorded and shown in the tensorboard. I'm not sure if TF API was changed after this and similar questions.
with Tensorflow ver-r1.3
Add your summary ops in your estimator model_fn
example :
tf.summary.histogram(tensorOp.name, tensorOp)
If you feel writing summaries may consume time and space, you can control the writing frequency of summaries, in your Estimator run_config
run_config = tf.contrib.learn.RunConfig()
run_config = run_config.replace(model_dir=FLAGS.model_dir)
run_config = run_config.replace(save_summary_steps=150)
Note: this will affect the overall summary writer frequency for TensorBoard logging, of your estimator (tf.estimator.Estimator)

Categories

Resources