I am using tensorflow2.6 and my code requires setting the below code at starting because I use symbolic keras tensor in partial loss in my model
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()
At the same time I also want to train on multiple gpus hence I used mirrored strategy but the issue is mirrored strategy requires setting eager execution which fails with my above disableness. Please help me if there exists another way of training on multiple gpus.
I have tried calling my code with the below but it got stuck saying significant overhead ahead.
tf.config.run_functions_eagerly(True)
but this is wrong i believe since as i mentioned** i need the disableness of eagerly mode.**
Related
I'm working on a project where I have trained a series of binary classifiers with Keras, with Tensorflow as the backend engine. The input data I have is a series of images, where each binary classifier must make the prediction on the images, later I save the predictions on a CSV file.
The problem I have is when I get the predictions from the first series of binary classifiers there isn't any warning, but when the 5th or 6th binary classifier calls the method predict on the input data I get the following warning:
WARNING:tensorflow:5 out of the last 5 calls to <function
Model.make_predict_function..predict_function at
0x2b280ff5c158> triggered tf.function retracing. Tracing is expensive
and the excessive number of tracings could be due to (1) creating
#tf.function repeatedly in a loop, (2) passing tensors with different
shapes, (3) passing Python objects instead of tensors. For (1), please
define your #tf.function outside of the loop. For (2), #tf.function
has experimental_relax_shapes=True option that relaxes argument shapes
that can avoid unnecessary retracing. For (3), please refer to
https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args
and https://www.tensorflow.org/api_docs/python/tf/function for more
details.
To answer each point in the parenthesis, here are my answers:
The predict method is called inside a for loop.
I don't pass tensors but a list of NumPy arrays of gray scale images, all of them with the same size in width and height. The only thing that can change is the batch size because the list can have only 1 image or more than one.
As I wrote in point 2, I pass a list of NumPy arrays.
I have debugged my program and found that this warning always happens when the method predict is called. To summarize the code I have written is the following:
import cv2 as cv
import tensorflow as tf
from tensorflow.keras.models import load_model
# Load the models
binary_classifiers = [load_model(path) for path in path2models]
# Get the images
images = [#Load the images with OpenCV]
# Apply the resizing and reshapes on the images.
my_list = list()
for image in images:
image_reworked = # Apply the resizing and reshaping on images
my_list.append(image_reworked)
# Get the prediction from each model
# This is where I get the warning
predictions = [model.predict(x=my_list,verbose=0) for model in binary_classifiers]
What I have tried
I have defined a function as tf.function and putted the code of the predictions inside the tf.function like this
#tf.function
def testing(models, faces):
return [model.predict(x=faces,verbose=0) for model in models]
But I ended up getting the following error:
RuntimeError: Detected a call to Model.predict inside a
tf.function. Model.predict is a high-level endpoint that manages
its own tf.function. Please move the call to Model.predict outside
of all enclosing tf.functions. Note that you can call a Model
directly on Tensors inside a tf.function like: model(x).
So calling the method predict is basically already a tf.function. So it's useless to define a tf.function when the warning I get it's from that method.
I have also checked those other two questions:
Tensorflow 2: Getting "WARNING:tensorflow:9 out of the last 9 calls to triggered tf.function retracing. Tracing is expensive"
Loading multiple saved tensorflow/keras models for prediction
But neither of the two questions answers my question about how to avoid this warning. Plus I have also checked the links in the warning message but I couldn't solve my problem.
What I want
I simply want to avoid this warning. While I'm still getting the predictions from the models I noticed that the python program takes way too much time on doing predictions for a list of images.
What I'm using
Python 3.6.13
Tensorflow 2.3.0
Solution
After some tries to suppress the warning from the predict method, I have checked the documentation of Tensorflow and in one of the first tutorials on how to use Tensorflow it is explained that, by default, Tensorflow is executed in eager mode, which is useful for testing and debugging the network models. Since I have already tested my models many times, it was only required to disable the eager mode by writing this single python line of code:
tf.compat.v1.disable_eager_execution()
Now the warning doesn't show up anymore.
For the benefit of community providing solution here
After some tries to suppress the warning from the predict method, I
have checked the documentation of Tensorflow and in one of the first
tutorials on how to use Tensorflow it is explained that, by default,
Tensorflow is executed in eager mode, which is useful for testing and
debugging the network models. Since I have already tested my models
many times, it was only required to disable the eager mode by writing
this single python line of code:
tf.compat.v1.disable_eager_execution()
Now the warning doesn't show up anymore. (paraphrased from Simone)
tf.compat.v1.disable_eager_execution() can only be called before any Graphs, Ops, or Tensors have been created. It can be used at the beginning of the program for migration projects from TensorFlow 1.x to 2.x.
For more details you can refer Eager execution
The problem
I have a (very) small and fast model saved in the SavedModel format which I can load and run with the following code:
model = tf.keras.models.load_model("./<SavedModelDir>")
outputs = model(inputs, training=False)
The predict function runs in 0.05 seconds with a batch of 5 inputs (on a Nvidia GPU).
If however I use model.predict_on_batch(inputs) or model.predict(inputs) the performance drops significantly to 0.65 - 0.80 seconds for a batch of 5. This is consistent with the documentation that states that using model() (__call__) is usually faster for smaller inputs.
The problem I am having is the fact that I am trying to port my model to a C(++) program. And using TF_SessionRun() for the C api and model_bundle.GetSession()->Run() I am getting performance similar to "slow" Python inference methods.
What I have tried
Another (very) small model with small batch, same result.
I tried disabling optimizations with tf.config.optimizer.set_experimental_options({'disable_meta_optimizer': False}) to make sure this does not negatively impact performance but this made things even slower.
I also tried converting the SavedModel to a TensorRT SavedModel. This increases the performance of the model() (__call__) method even further but all the other methods stop working in Python and in the downloaded precompiled Tensorflow C GPU api (2.5.0) and the C++ API compiled with Tensorflow_CC I get an error about the operation not being found (TensorRT does not seem to work).
All the performance numbers given were run after a few warmup runs.
Performance measured both with Tensorflow profiler and Python's time.time
I checked if model() (__call__) is working correctly by checking the output and it is.
My question(s)
Is there a way to get model() (__call__) performance with the Tensorflow C(++) API?
The problem seems to be somewhere in Tensorflows optimization for larger batch sizes which decreases the performance of smaller batch sizes. Is there another API that allows faster inference on small batches out of the box (TensorRT C++ API?)?
I think I figured it out by accident by doing the following for something else I was trying:
tf.compat.v1.disable_v2_behavior() at the top of the script. And then print(len(outputs)) right after getting the outputs. This gives the following error: TypeError: len is not well defined for symbolic Tensors..
By Googling I found out that symbolic tensors are tensors that do not directly hold values so the values are probably filled in later.
This means that Model() (__call__) does its computation asynchronous, timing the function gives us a false value. This can be "fixed" by stopping the time after printing/using every output or just using the predict() method to avoid this completely.
I need to use multiple (my) TF CNN models for a rather complex task. I also need to use easyOCR (PyTorch) every now and then. However easyOCR usage is pretty rare and the task is very small in comparison to TF models inference. Therefore I use gpu=False in easyocr.Reader constructor. Nevertheless, as soon as easyocr predicts anything, GPU is allocated for for pyTorch (This is a known bug, I already checked easyOCR's github issues) and any TF model throws error:
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so tr
y looking to see if a warning log message was printed above.
If I predict with any TF model first, the easyocr model throws:
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278
I have found a workaround, but it seems rather dangerous to put something like this into production.
Is there a more safe way of achieving this?
Let's say that I want to fine tune one of the Tensorflow Hub image feature vector modules. The problem arises because in order to fine-tune a module, the following needs to be done:
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3", trainable=True, tags={"train"})
Assuming that the module is Resnet50.
In other words, the module is imported with the trainable flag set as True and with the train tag. Now, in case I want to validate the model (perform inference on the validation set in order to measure the performance of the model), I can't switch off the batch-norm because of the train tag and the trainable flag.
Please note that this question has already been asked here Tensorflow hub fine-tune and evaluate but no answer has been provided.
I also raised a Github issue about it issue about it.
Looking forward to your help!
With hub.Module for TF1, the situation is as you say: either the training or the inference graph is instantiated, and there is no good way to import both and share variables between them in a single tf.Session. That's informed by the approach used by Estimators and many other training scripts in TF1 (esp. distributed ones): there's a training Session that produces checkpoints, and a separate evaluation Session that restores model weights from them. (The two will likely also differ in the dataset they read and the preprocessing they perform.)
With TF2 and its emphasis on Eager mode, this has changed. TF2-style Hub modules (as found at https://tfhub.dev/s?q=tf2-preview) are really just TF2-style SavedModels, and these don't come with multiple graph versions. Instead, the __call__ function on the restored top-level object takes an optional training=... parameter if the train/inference distinction is required.
With this, TF2 should match your expectations. See the interactive demo tf2_image_retraining.ipynb and the underlying code in tensorflow_hub/keras_layer.py for how it can be done. The TF Hub team is working on making more complete selection of modules available for the TF2 release.
Occassionally we may encounter some nan/inf in gradients during backprop on seq2seq Tensorflow models. How could we easily find the cause of such issue, e.g. by locating the op and time step on which nan/inf is produced?
Since the error occurs on backpropagation, we could not simply observe the gradient values with tf.Print(). Also in a RNN model, tf.add_check_numerics_ops() doesn't work, and we could not use tf.check_numerics() unless we dig into the messy tf libraries or reimplement the control flow manually. While tfdbg, as a general solution, is hard to use and extremely slow on large models.