tf.estimator input_fn and eager mode

tf.estimator input_fn and eager mode - python

I tried to use numpy inside cnn_model.evaluate(), but it gave AttributeError: 'Tensor' object has no attribute 'numpy'. I used numpy to calculate accuracy and mean squared error using tf.keras.metrics.Accuracy() and tf.keras.metrics.MeanSquaredError() inside cnn_model.evaluate()
I googled it, and in tensorflow documentation, it said
"Calling methods of Estimator will work while eager execution is enabled. However, the model_fn and input_fn is not executed eagerly, Estimator will switch to graph mode before calling all user-provided functions (incl. hooks), so their code has to be compatible with graph mode execution."
So, I was wondering how I can update the current tf 1.x code to tf 2.1.0 code, while also using above information.
My current code is:
eval_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
x={"x": np.array(train_inputs, dtype=np.float32)},
y=np.array(train_labels, dtype=np.float32),
#y=np.array(train_labels),
batch_size=1,
num_epochs=1,
shuffle=False)
eval_results = CNN.evaluate(input_fn=eval_input_fn)
What I have tried so far is add tf.compat.v1.enable_eager_execution() to the 1) beginning of the code after all the imports, 2) next line right after importing tf, 3) line right before declaring eval_input_fn, 4) line right before calling eval_results, 5) inside CNN model definition. It all failed to turn on the eager mode.
One other option that I found was remove #tf.function decorator, but I have no idea what that means and how to pass input_fn if #tf.function is removed.

Related

Unable to debug where torch Adam optimiser is going wrong

I was implementing a training loop in vscode. I have created a Adam optimizer using XLM-Roberta model as follows:
xlm_r_model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base",
num_labels = NUM_LABELS,
output_attentions=False,
output_hidden_states=False
)
xlm_r_model.to(device)
optimizer = torch.optim.Adam(xlm_r_model.parameters(), lr=LR)
Then at following line:
optimizer.step()
vscode simply terminates the execution, without any error stack trace.
So I debugged to get to know exactly where this is happening. I reached this line, which makes F.adam(...) call:
Weirdly, on github, torch.optim.adam does not have this line. It seems that the closest matching line is line 150.
This call then goes to torch.optim._functional.adam:
In above image, those params (line 72) in for loop contains 201 elements and am unable to figure it out exactly which param is going wrong. When I continue it to run, it doesn't pause in debug mode whenever error occurs, instead vscode simply terminates.
Again, I am not able to find this function on github's _functional version
When I checked several Kaggle notebooks (1,2,3,4) for training xlm roberta, they are using AdamW and torch_xla package to train on TPUs something like this:
import torch_xla.core.xla_model as xm
optimizer = AdamW([{'params': model.roberta.parameters(), 'lr': LR},
{'params': [param for name, param in model.named_parameters() if 'roberta' not in name], 'lr': 1e-3} ], lr=LR, weight_decay=0)
xm.optimizer_step(optimizer)
Do I miss some contenxt and it is indeed compulsory to train using AdamW or torch_xla? Or am doing some stupid mistake?
PS:
Am running this no colab . Its pip shows torch version 1.10.0+cu111 and python 3.7.13. I have run codeserver on colab through colabcode and debugging in browser based vscode.
I was able to train Bert with Adam optimizer earlier.

How to use evaluate and predict functions in keras implementation of SincNet?

thanks for your atention, I'm developing an automatic speaker recognition system using SincNet.
Ravanelli, M., & Bengio, Y. (2018, December). Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
Since the network is coded in Pytorch I searched and found a Keras implementation here https://github.com/grausof/keras-sincnet. I adapted the train.py code to train a Sincnet with my own data in Tensorflow 2.0, and worked fine, I saved only the weights of my trained network, my training data has shape 128,3200,1 for inputs and 128 for labels per batch
#Creates a Sincnet model with input_size=3200 (wlen), num_classes=40, fs=16000
redsinc = create_model(wlen,num_classes,fs)
#Saves only weights and stopearly callback
checkpointer = ModelCheckpoint(filepath='checkpoints/SincNetBiomex3.hdf5',verbose=1,
save_best_only=True, monitor='val_accuracy',save_weights_only=True)
stopearly = EarlyStopping(monitor='val_accuracy',patience=3,verbose=1)
callbacks = [checkpointer,stopearly]
# optimizer = RMSprop(lr=learnrate, rho=0.9, epsilon=1e-8)
optimizer = Adam(learning_rate=learnrate)
# Creates generator of training batches
train_generator = batchGenerator(batch_size,train_inputs,train_labels,wlen)
validinputs, validlabels = create_batches_rnd(validation_labels.shape[0],
validation_inputs,validation_labels,wlen)
#Compiling model and train with function fit_generator
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history = redsinc.fit_generator(train_generator, steps_per_epoch=N_batches, epochs = epochs,
verbose = 1, callbacks=callbacks, validation_data=(validinputs,validlabels))
The problem came when I tried to evaluate the network, I didn't use the code found in test.py, I only loaded the weights I previously saved and use the function evaluate, my test data had the shape 1200,3200,1 for the inputs and 1200 for labels.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
test_loss, test_accuracy = redsinc.evaluate(x=eval_in,y=eval_lab)
RuntimeError: You must compile your model before training/testing. Use `model.compile(optimizer,
loss)`.
Then I added the same compile code I used for training:
optimizer = Adam(learning_rate=0.001)
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
Then rerun the test code and got this:
WARNING:tensorflow:From C:\Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-
packages\tensorflow_core\python\ops\resource_variable_ops.py:1781: calling
BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is
deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
ValueError: A tf.Variable created inside your tf.function has been garbage-collected. Your code needs to keep Python references to variables created inside `tf.function`s.
A common way to raise this error is to create and return a variable only referenced inside your function:
#tf.function
def f():
v = tf.Variable(1.0)
return v
v = f() # Crashes with this error message!
The reason this crashes is that #tf.function annotated function returns a **`tf.Tensor`** with the **value** of the variable when the function is called rather than the variable instance itself. As such there is no code holding a reference to the `v` created inside the function and Python garbage collects it.
The simplest way to fix this issue is to create variables outside the function and capture them:
v = tf.Variable(1.0)
#tf.function
def f():
return v
f() # <tf.Tensor: ... numpy=1.>
v.assign_add(1.)
f() # <tf.Tensor: ... numpy=2.>
I don't understand the error since I've evaluated other networks with the same function and never got any problems. Then I decided to use predict function to match predicted labels with correct labels and obtain all metrics with my own code but I got another error.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
print('Model loaded')
#Predict labels with test data
predict_labels = redsinc.predict(eval_in)
Error while reading resource variable _AnonymousVar212 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar212/class tensorflow::Var does not exist.
[[node sinc_conv1d/concat_104/ReadVariableOp (defined at \Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_13649]
Function call stack:
keras_scratch_graph
I hope someone can tell me what these errors mean and how to solve them, I've searched for solutions to them but most of the solutions I've found don't seem related to my problem so I can't apply those solutions. I'm guessing the errors are caused by the Sincnet layer code, because it is a custom coded layer. The code for Sincnet layer can be found in the github repository in the file sincnet.py.
I appreciate all help I can get, again thank you for your atention.

You should downgrade your tf and keras version, it works to me when I faced the same problem.
Try this keras==2.1.6; tensorflow-gpu==1.13.1

How to write to TensorBoard in TensorFlow 2

I'm quite familiar in TensorFlow 1.x and I'm considering to switch to TensorFlow 2 for an upcoming project. I'm having some trouble understanding how to write scalars to TensorBoard logs with eager execution, using a custom training loop.
Problem description
In tf1 you would create some summary ops (one op for each thing you would want to store), which you would then merge into a single op, run that merged op inside a session and then write this to a file using a FileWriter object. Assuming sess is our tf.Session(), an example of how this worked can be seen below:
# While defining our computation graph, define summary ops:
# ... some ops ...
tf.summary.scalar('scalar_1', scalar_1)
# ... some more ops ...
tf.summary.scalar('scalar_2', scalar_2)
# ... etc.
# Merge all these summaries into a single op:
merged = tf.summary.merge_all()
# Define a FileWriter (i.e. an object that writes summaries to files):
writer = tf.summary.FileWriter(log_dir, sess.graph)
# Inside the training loop run the op and write the results to a file:
for i in range(num_iters):
summary, ... = sess.run([merged, ...], ...)
writer.add_summary(summary, i)
The problem is that sessions don't exist anymore in tf2 and I would prefer not disabling eager execution to make this work. The official documentation is written for tf1 and all references I can find suggest using the Tensorboard keras callback. However, as far as I know, this only works if you train the model through model.fit(...) and not through a custom training loop.
What I've tried
The tf1 version of tf.summary functions, outside of a session. Obviously any combination of these functions fails, as FileWriters, merge_ops, etc. don't even exist in tf2.
This medium post states that there has been a "cleanup" in some tensorflow APIs including tf.summary(). They suggest using from tensorflow.python.ops.summary_ops_v2, which doesn't seem to work. This implies using a record_summaries_every_n_global_steps; more on this later.
A series of other posts 1, 2, 3, suggest using the tf.contrib.summary and tf.contrib.FileWriter. However, tf.contrib has been removed from the core TensorFlow repository and build process.
A TensorFlow v2 showcase from the official repo, which again uses the tf.contrib summaries along with the record_summaries_every_n_global_steps mentioned previously. I couldn't make this to work either (even without using the contrib library).
tl;dr
My questions are:
Is there a way to properly use tf.summary in TensroFlow 2?
If not, is there another way to write TensorBoard logs in TensorFlow 2, when using a custom training loop (not model.fit())?

Yes, there is a simpler and more elegant way to use summaries in TensorFlow v2.
First, create a file writer that stores the logs (e.g. in a directory named log_dir):
writer = tf.summary.create_file_writer(log_dir)
Anywhere you want to write something to the log file (e.g. a scalar) use your good old tf.summary.scalar inside a context created by the writer. Suppose you want to store the value of scalar_1 for step i:
with writer.as_default():
tf.summary.scalar('scalar_1', scalar_1, step=i)
You can open as many of these contexts as you like inside or outside of your training loop.
Example:
# create the file writer object
writer = tf.summary.create_file_writer(log_dir)
for i, (x, y) in enumerate(train_set):
with tf.GradientTape() as tape:
y_ = model(x)
loss = loss_func(y, y_)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# write the loss value
with writer.as_default():
tf.summary.scalar('training loss', loss, step=i+1)

TensorFlow 2.0 tf.keras API Eager mode vs. Graph mode

In TensorFlow <2 the training function for a DDPG actor could be concisely implemented using tf.keras.backend.function as follows:
critic_output = self.critic([self.actor(state_input), state_input])
actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights,
loss=-tf.keras.backend.mean(critic_output))
self.actor_train_on_batch = tf.keras.backend.function(inputs=[state_input],
outputs=[self.actor(state_input)],
updates=actor_updates)
Then during each training step calling self.actor_train_on_batch([np.array(state_batch)]) would compute the gradients and perform the updates.
However running that on TF 2.0 gives the following error due to eager mode being on by default:
actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights, loss=-tf.keras.backend.mean(critic_output))
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py", line 448, in get_updates
grads = self.get_gradients(loss, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py", line 361, in get_gradients
grads = gradients.gradients(loss, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 547, in _GradientsHelper
raise RuntimeError("tf.gradients is not supported when eager execution "
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
As expected, disabling eager execution via tf.compat.v1.disable_eager_execution() fixes the issue.
However I don't want to disable eager execution for everything - I would like to use purely the 2.0 API.
The exception suggests using tf.GradientTape instead of tf.gradients but that's an internal call.
Question: What is the appropriate way of computing -tf.keras.backend.mean(critic_output) in graph mode (in TensorFlow 2.0)?

As far as I understood, your critic_output is just a TensorFlow tensor, so you can just use tf.math.reduce_mean operation. And it'll work in a TensorFlow session, not in imperative style. I.e. this will return an operation to be evaluated in a TensorFlow session.
import tensorflow as tf
import numpy as np
inp = tf.placeholder(dtype=tf.float32)
mean_op = tf.math.reduce_mean(inp)
with tf.Session() as sess:
print(sess.run(mean_op, feed_dict={inp: np.ones(10)}))
print(sess.run(mean_op, feed_dict={inp: np.random.randn(10)}))
It'll evaluate in something like:
1.0
-0.002577734

So, first of all you error is related to the fact that optimizer.get_updates() is designed for graph mode as it does include the K.gradients() needed to get the gradients tensors and then apply the Keras optimizer-based update to the trainable variables of the model using the K.function.
Secondly, in terms of eager-mode-or-not soundness the cost function loss=-tf.keras.backend.mean(critic_output) has no flows.
What you should to is get rid of your graph mode code and stick to the native 2.0 eager mode. Based on your code the training should look like:
def train_method(self, state_input):
with tf.GradientTape() as tape:
critic_output = self.critic([self.actor(state_input), state_input])
loss=-tf.keras.backend.mean(critic_output)
grads = tape.gradient(loss, params=self.actor.trainable_variables)
# now please note that self.optimizer_actor must have apply_gradients
# so it should be tf.train.OptimizerName...
self.optimizer_actor.apply_gradients(zip(grads, self.actor.trainable_variables))

Tensorflow - Using tf.summary with 1.2 Estimator API

I'm trying to add some TensorBoard logging to a model which uses the new tf.estimator API.
I have a hook set up like so:
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
summary_op=tf.summary.merge_all())
# ...
classifier.train(
input_fn,
steps=1000,
hooks=[summary_hook])
In my model_fn, I am also creating a summary -
def model_fn(features, labels, mode):
# ... model stuff, calculate the value of loss
tf.summary.scalar("loss", loss)
# ...
However, when I run this code, I get the following error from the summary_hook:
Exactly one of scaffold or summary_op must be provided. This is probably because tf.summary.merge_all() is not finding any summaries and is returning None, despite the tf.summary.scalar I declared in the model_fn.
Any ideas why this wouldn't be working?

Use tf.train.Scaffold() and pass tf.merge_all as following
summary_hook = tf.train.SummarySaverHook(
save_secs=2,
output_dir=MODEL_DIR,
scaffold=tf.train.Scaffold(summary_op=tf.summary.merge_all()))

Just for whoever have this question in the future, the selected solution doesn't work for me (see my comments in the selected solution).
Actually, with TF 1.2 Estimator API, one doesn't need to have summary_hook. I just have tf.summary.scalar("loss", loss) in the model_fn, and run the code without summary_hook. The loss is recorded and shown in the tensorboard. I'm not sure if TF API was changed after this and similar questions.

with Tensorflow ver-r1.3
Add your summary ops in your estimator model_fn
example :
tf.summary.histogram(tensorOp.name, tensorOp)
If you feel writing summaries may consume time and space, you can control the writing frequency of summaries, in your Estimator run_config
run_config = tf.contrib.learn.RunConfig()
run_config = run_config.replace(model_dir=FLAGS.model_dir)
run_config = run_config.replace(save_summary_steps=150)
Note: this will affect the overall summary writer frequency for TensorBoard logging, of your estimator (tf.estimator.Estimator)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.