I've been recently using a TF object detection model I created for a ML project I'm working on. But I feel like I'm getting subpar performance from my GPU running inference, 3060ti. I am using TF 2.4, Cuda 11.2, Cudnn 8.04. I was wondering if there was a way I could benchmark my install to other users to compare speeds. I am running with Windows 10 currently, but soon want to try running TF in container with Ubuntu.
Related
I'm trying to get started with tensorflow using the python interface. I'm building an image classification system explained Here. But when running the code, epochs take too much time, almost 2 minutes for 1 epoch, and if number of steps are increased, epoch running time increases exponentially.
My system configurations are:
and software configurations are:
Python 3.7
Spyder 4
Tensorflow 2.2.0
I found similar thread Here but in my case, basic operations are fast enough.
How can I improve performance of tensorflow
Sadly, both TensorFlow and PyTorch use only Cuda as a backend for GPU acceleration and they don't support any of the Mac'a GPUs - Intel's or AMD's. This means that your TensorFlow code would run only on CPU.
I wanted to teach an image classification CNN, and use Keras for it.
The image dimensions are 300x300x3.
I have trained a CNN with 2M parameters, I used MobileNet of Keras for transfer learning, however I freeze last 63 layers and add dense layers at the bottom, the last layer has 2 unit and Softmax activation.
To make predictions, I load the h5 file and use OpenCV video capture to get video frames, for each frame I use model.predict(img_array).
When i look to the Task Manager of Windows 10 , I see that the Python script uses %80 of my processor but %2 of GPU. This CPU usage causes Lags on my laptop.
How can I reduce the CPU usage and force Keras to make computations with GPU?
I have Nvidia Rtx 2060 4GB and Intel Core i7-9750H on my laptop.
Tensorflow 2.1 and Keras 2.3.1
OpenCV 4.1
I have tried, but actually nothing changes.
tf.config.threading.set_inter_op_parallelism_threads(12)
tf.config.threading.set_intra_op_parallelism_threads(12)
with tf.device(\gpu:0):
model.predict(img_array)
Best regards.
Edit:
I reduce the CPU usage to %20 with declaring steps parameter in the predict method.
Please check your pip list or conda list.
Sometimes, we mistakenly install both tensorflow and tensorflow-gpu.
If you have both, the system will automatically go for tensorflow, which is the CPU one.
If that is the case, DELETE "tensorflow", keeping only "tensorflow-gpu".
If you do not see tensorflow-gpu in the first place, try installing it on conda using the following commands:
conda create -n [EnvironmentName] python=3.6
conda activate [EnvironmentName]
conda install -c conda-forge tensorflow-gpu==1.14
it will assess which version (CUDA,CUDNN, etc.) you require and download and install it directly to your environment. Then run your python file from this environment. Good luck ^_^
I have installed Keras with gpu support in R based on Tensorflow with gpu support. This is installed with these steps:
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
If I run the Bosting housing example code from the book Deep learning with R, I receive this screen:
Can I conclude that the code runs on the GPU?
Or is this line from the picture above giving an error:
GPU libraries are statically linked, skip dlopen check.
During running the code the GPU is running only on 3% of capacity while the CPU is running on 20-25%.
The code is NOT running faster than while I initially did run the code without installing GPU support.
Thank you!
Yes, tensorflow is running with GPU enabled. Boston Housing is a relatively small dataset and probably does not benefit from using the GPU to a large degree. The lines below indicate it is running on the GPU. "Created tensorflow device (/job:localhost/replica:0/task:0device:GPU:0".
From the guide at Tensorflow
You can set tf.debugging.set_log_device_placement(True) in order to explicitly see where each operation is running. THE R equivalent is below.
library(tensorflow)
tf$debugging$set_log_device_placement(TRUE)
I have converted a tensorflow inference graph to tflite model file (*.tflite), according to instructions from https://www.tensorflow.org/lite/convert.
I tested the tflite model on my GPU server, which has 4 Nvidia TITAN GPUs. I used the tf.lite.Interpreter to load and run tflite model file.
It works as the former tensorflow graph, however, the problem is that the inference became too slow. When I checked out the reason, I found that the GPU utilization is simply 0% when tf.lite.Interpreter is running.
Is there any method that I can run tf.lite.Interpreter with GPU support?
https://github.com/tensorflow/tensorflow/issues/34536
CPU is kind of good enough for tflite, especially multicore.
nvidia GPU likely not updated for tflite, which is for mobile GPU platform.
Conspiracy: they (TF-NVIDIA) hand-shake to not let TFlite working on GPU ? oo easy to make one.
Steve
System information
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow version: tensorflow-gpu (1.7.0)
Python version: Python 3.5.2
CUDA/cuDNN version: CUDA 9.0 cuDNN 7
Describe the problem
I have a cuda lib build from C++ for post-processing after predict result by tensorflow model.
I use following way to make python able to use cuda code from C++
lib = ctypes.cdll.LoadLibrary(my.so)
result = lib.post_process(tensorflow_result)
If I test the cuda code alone without tensorflow, it work fine. (I save the result from tensorflow then use cv2.imread to feed into my cuda code)
But when tensorflow is used in my project, my cuda code become 10 times slower....
My time log is in cuda .so lib, so it's no way that the gap come from python to .so wrap.
I have try to set the fraction of GPU memory to be allocated in tensorflow by:
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
but useless....
so I wonder does tensorflow take all resource from GPU, making other CUDA code slow ?
the only solution is make my cuda code as a tensorflow OP by register?
Any suggestion? Thanks~~~
----------------------Update----------------------
I have tested what #AnandCU say.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
but it doesn't make my cuda code speed up like I test it alone without tensorflow.