Tensorflow 2.3 does not use GPU - python

I have a machine with eight GPUs but Tensorflow doesn't seem to use them when training.
Local Environment
Here's some information about the environment:
tensorflow-gpu 2.3.1 is installed.
nvidia-smi command reports: NVIDIA-SMI 440.82, Driver Version: 440.82, CUDA Version: 10.2
nvcc --version command reports:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
Symptoms
When I run model.fit() with a large set of data, it doesn't seem to use GPUs at all. nvidia-smi shows 0% usage for all GPUs and the CPU usage ranges 400-700% (it's a 16-core machine).
I suspected there is something wrong with my model (perhaps some instructions cannot be compiled into CUDA C or something like that), so I tested it on a Google Colab GPU instance. It takes 10-15ms per step (13s for each epoch) whereas it would take over 100ms for each step on my machine. This leads me to believe that my model is being trained using GPUs on Google Colab.
Interesting Factors
The following code
import tensorflow as tf
tf.config.list_physical_devices()
produces this:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:1', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:2', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:3', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:4', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:5', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:6', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:7', device_type='XLA_GPU')]
But this
tf.test.gpu_device_name()
returns an empty string.
However, on Google Colab,
>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>> tf.test.gpu_device_name()
'/device:GPU:0'
The only meaningful difference I found between my machine and Google Colab at this point is that my machine has XLA_GPU devices whereas Google Colab has GPU. I'm not entirely sure if this has anything to do with the issue I'm having. Is anyone experiencing similar issues?

Related

PyTorch CUDA GPU not utilized properly

I am trying to train a pytorch model on my local machine. It has the following GPUs:
As you see the second is NVIDIA and thus should be used with CUDA. In fact if I check torch.cuda.device_count() it returns 1 and torch.cuda.get_device_name() returns NVIDIA GeForce 930MX. When I run the script however the usage of the built-in Intel GPU goes up to 100% and then the program crashes with:
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service
The usage (as seen from the task manager) of the targeted GPU (NVIDIA) remains 0% so it has not been called.
What configuration steps I might have messed up and what would you propose in order to run PyTorch on the proper GPU.
*Using the LTS versions of torch and CUDA as of the day of posting the question.

Tensorflow GPU "SSE4.1 instructions" issue inside a Docker

I try to run tensorflow 1.13.1 inside a docker (the image with the wanted configuration is evariste/autodl:gpu-latest).
The docker has access to a RTX 2080 Ti GPU.
I get the following error:
2020-09-10 16:09:47.428460: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use SSE4.1 instructions, but these aren't available on your machine.
SSE4.1 is an instruction set supported by CPU, not GPU. Thus you need to check if your CPU supports it; more discussions about this topic can be found here.

pixel-cnn (tensorflow-gpu) not recognising GPU

I'm trying to run the pixel-cnn neural network available on github. Following the instructions in README.md I run the following code in cmd:
train.py -i ./data_dir/ -o ./save_dir -g 1
I'm using one gpu and created the two folders ./data_dir and ./save_dir within the same directory as train.py for loading & saving the data. When doing so I get the following error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model_1/ones: node model_1/ones (defined at \OneDrive - MNG\Matura Arbeit\Projects\pixel-cnn-master\pixel_cnn_pp\model.py:36) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
It seems that the tensorflow doesn't recognise the GPU but when checking the devices available to tensorflow (as described here) both my CPU and GPU show up as "/device:CPU:0" and /device:GPU:0". Also, when running other programs with tensorflow-gpu it work perfectly fine.
I have installed tensorflow-gpu==1.14.0. As for the CUDA I'm pretty sure I have installed version 10.0, as shown by nvcc --version. Although when running nvidia-smi it shows that CUDA version 10.1 is installed.
(edited:)I am using an Anaconda evironment (Windows 10) with tensorflow-gpu==1.14.0. The GPU I'm using is a GTX 1050Ti with Max-Q Design and driver version 436.30. As for CUDA I'm pretty sure I have installed version 10.0, as shown by nvcc --version. Although when running nvidia-smi it shows that CUDA version 10.1 is installed.

Minimum required hardware component to install tensorflow-gpu in python

I'm tried many PC with different hardware capability to install tensorflow on gpu, they are either un-compatible or compatible but stuck in some point. I would like to know the minimum hardware required to install tensorflow-gpu. And also I would like to ask about some hardware, Is they are allowed or not:
Can I use core i5 instead of core i7 ??
Is 4 GB gpu enough for training the dataset??
Is 8 GB ram enough for training and evaluating the dataset ?? with most thanks.
TensorFlow (TF) GPU 1.6 and above requires cuda compute capability (ccc) of 3.5 or higher and requires AVX instruction support.
https://www.tensorflow.org/install/gpu#hardware_requirements.
https://www.tensorflow.org/install/pip#hardware-requirements.
Therefore you would want to buy a graphics card that has ccc above 3.5.
Here's a link that shows ccc for various nvidia graphic cards https://developer.nvidia.com/cuda-gpus.
However if your cuda compute capability is below 3.5 you have to compile TF from sources yourself. This procedure may or may not work depending on the build flags you choose while compiling and is not straightforward.
In my humble opinion, The simplest way is to use TF-GPU pre-built binaries to install TF GPU.
To answer your questions. Yes you can use TF comfortably on i5 with 4gb of graphics card and 8gb ram. The training time may take longer though, depending on task at hand.
In summary, the main hardware requirement to install TF GPU is getting a Nvidia graphics card with cuda compute capability more than 3.5, more the merrier.
Note that TF officially supports only NVIDIA graphics card.
You should find your answers here:
https://www.nvidia.com/en-gb/data-center/gpu-accelerated-applications/tensorflow/
From the link:
The GPU-enabled version of TensorFlow has the following requirements:
64-bit Linux
Python 2.7
CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)
cuDNN v5.1 (cuDNN v6 if on TF v1.3)

does tensorflow take all resource from GPU, making other CUDA code slow?

System information
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow version: tensorflow-gpu (1.7.0)
Python version: Python 3.5.2
CUDA/cuDNN version: CUDA 9.0 cuDNN 7
Describe the problem
I have a cuda lib build from C++ for post-processing after predict result by tensorflow model.
I use following way to make python able to use cuda code from C++
lib = ctypes.cdll.LoadLibrary(my.so)
result = lib.post_process(tensorflow_result)
If I test the cuda code alone without tensorflow, it work fine. (I save the result from tensorflow then use cv2.imread to feed into my cuda code)
But when tensorflow is used in my project, my cuda code become 10 times slower....
My time log is in cuda .so lib, so it's no way that the gap come from python to .so wrap.
I have try to set the fraction of GPU memory to be allocated in tensorflow by:
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
but useless....
so I wonder does tensorflow take all resource from GPU, making other CUDA code slow ?
the only solution is make my cuda code as a tensorflow OP by register?
Any suggestion? Thanks~~~
----------------------Update----------------------
I have tested what #AnandCU say.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
but it doesn't make my cuda code speed up like I test it alone without tensorflow.

Categories

Resources