Unable to Run Tensorflow/Keras with GPU - python

I tried to run Keras with my GPU but got the following error:
C:\Python36\lib\site-packages\skimage\transform_warps.py:84:
UserWarning: The default mode, 'constant', will be changed to
'reflect' in skimage 0.15. warn("The default mode, 'constant', will
be changed to 'reflect' in " E
C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:378]
Loaded runtime CuDNN library: 7102 (compatibility version 7100) but
source was compiled with 7003 (compatibility version 7000). If using
a binary install, upgrade your CuDNN library to match. If building
from sources, make sure the library loaded at runtime matches a
compatible version specified during compile configuration.
F
C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717]
Check failed: stream->parent()->GetConvolveAlgorithms(
conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
I have tensorflow 1.6, CUDA version: Cuda compilation tools, release 9.0, V9.0.176
Does anyone know whats wrong here?

You need to install cuDNN 7.0.5. The file can be downloaded here. After clicking Download and agreeing to the terms, the option will be listed.

Related

RuntimeError: Not compiled with GPU support

I'm trying to run a code and I got the error:
RuntimeError: Not compiled with GPU support
I searched around and realized it might be that my CUDA version has some issues.
I installed the newest CUDA 11.5 first and then I realized that pytorch doesn't support that version so I uninstalled CUDA 11.5 and reinstall CUDA 10.2
I already deleted everything that is related to CUDA 11.5 but when I run
python -c 'import torch; from torch.utils.cpp_extension import CUDA_HOME; print(torch.cuda.is_available(), CUDA_HOME)'
I still get C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5
instead of C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
I already went to the directory C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA to manually delete v11.5 but the CUDA_HOME doesn't change.
Any idea how to manually change the CUDA_HOME? Is there a way to delete CUDA 11.5 completely since I only want to keep CUDA 10.2
It seems your cuda enabled pytorch version is not installed in the directory you are working in. Simply, using command line type following command, you can confirm it. It might help you
import torch
torch.cuda.is_available()
True
If it is false, please install pytorch having enabled cuda using this link
https://pytorch.org/get-started/locally/
for example, for Linux
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Your cuDNN version is more recent than Theano

I have been trying to enable CUDA to run PyMC3 with the assistance of the GPU. Here are the specs of the machine/software I have been using:
Windows 10
Visual Studio Community 2019
Python 3.8.12
CUDA 10.2 (I tried 11.2 before that and obtained the same problem)
CuDNN 7.6.5 (I tried 8.1 with CUDA 11.2 and obtained the same problem)
TensorFlow 2.7.0
Theano-PyMC 1.1.2
Aesara 2.3.2 (the successor to Theano)
PyMC3 3.11.4
MKL 2.4.0
For the proper installation of Theano and CUDA in a Windows environment, I followed the advice provided on these web pages:
https://gist.github.com/ElefHead/93becdc9e99f2a9e4d2525a59f64b574
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
I have tested the installation against Tensorflow and it works. I have also used the tests provided on the Theano and Aesara "Read the Docs" sites (https://aesara.readthedocs.io/en/latest/tutorial/using_gpu.html#testing-the-gpu) and ran the check_blas test provided with Theano/Aesara (https://raw.githubusercontent.com/Theano/Theano/master/theano/misc/check_blas.py). After all this, I still get these disappointing error/warning messages:
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
UserWarning: Your cuDNN version is more recent than Aesara. If you encounter problems, try updating Aesara or downgrading cuDNN to a version >= v5 and <= v7
even though I have already downgraded cuDNN to 7.6.5 (and, obviously, can't use the GPU with Theano/Aesara/PyMC3).
With respect to the BLAS warning, I tried setting the blas__ldflags (Aesara) or blas.ldflags (Theano) as environment variables, assigning them the recommended MKL values -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -liomp5 -lmkl_mc -lpthread, still nothing works.
Can anybody please help me address these two issues?

Could not load dynamic library 'libnvinfer.so.6'

I am trying to normally import the TensorFlow python package, but I get the following error:
Here is the text from the above terminal image:
2020-02-23 19:01:06.163940: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-02-23 19:01:06.164019: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-02-23 19:01:06.164030: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
<module 'tensorflow_core._api.v2.version' from '/home/saman/miniconda3/envs/testconda/lib/python3.7/site-packages/tensorflow_core/_api/v2/version/__init__.py'
This is a warning, not an error. You can still use TensorFlow. The shared libraries libnvinfer and libnvinfer_plugin are optional and required only if you are using nvidia's TensorRT capabilities.
To suppress this and all other warnings, set the environment variable TF_CPP_MIN_LOG_LEVEL="2".
TensorFlow's installation instructions list the GPU dependencies (current as of December 13 2022):
The following NVIDIA® software are only required for GPU support.
NVIDIA® GPU drivers version 450.80.02 or higher.
CUDA® Toolkit 11.2.
cuDNN SDK 8.1.0.
(Optional) TensorRT to improve latency and throughput for inference.
I got this warning as a result of (accidental) update of libvnifer6 package. It got updated to 6.0.1-1+cuda10.2 while original installation used 6.0.1-1+cuda10.1.
After I uninstalled packages referencing cuda10.2 and re-ran
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
this warning went away.
Most of these messages are warnings, not errors. They just mean that libraries to use an Nvidia GPU are not installed, but you don't have to have any Nvidia GPU to use Tensorflow so you don't need these libraries. The comment by jakub tells how to turn off the warnings:
export TF_CPP_MIN_LOG_LEVEL="2"
However, I too run Tensorflow without Nvidia stuff and there is one more message that is an error, not a warning:
2020-04-10 10:04:13.365696: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
It should be irrelevant because it too refers to cuda, which is for Nvidia. It doesn't seems to be a fatal error though.
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
Little bit of an info from jakub's answer. This could occur if you don't install the 'machine-learning' repo. Try this if you already install CUDA successfully but still geting the error.
Then Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
You can download the zip file of tensorRT 6 and then should paste the x86 linux folder file to /usr/lib/cuda make sure that the lib folder in the x86_linux folder that you have downloaded should be renamed to lib64 . After pasteing all the files in the cuda directory reboot the system . Now Cuda and TensorRT engine will run smoothly in your system.
I spent like 5 hrs solving this issue. For my case, I believe it means that you have the wrong version of library. libnvinfer.so.6 is located at 'TensorRT-*/lib' and the number 6 means tensorFlow is looking for the libvinfer of TensorRT6. So if it's "could not load dynamic library libnvinfer.so.5", it means that you need TensorRT 5 to run the code.
Same as above, if it is showing Could not load dynamic library 'libcudart.so.10.0', you need the library in cuda 10.0 to run the code.
So updating your tensorrt/Cuda/Cudnn to match your tensorflow version would help. Note that your tensorrt/cuda/cudnn version should also match each other.

What does CuDNN compatibility error with tensorflow mean and how to fix it?

How to fix the error. Also i dont see CuDNN7.0.03 version. So please any lead would be helpful.
BELOW IS THE ERROR:
Created TensorFlow device
(/job:localhost/replica:0/task:0/device:GPU:0 with 3093 MB memory) ->
physical GPU (device: 0, name: GeForce 840M, pci bus id: 0000:04:00.0,
compute capability: 5.0) 2019-10-14 01:12:58.508334: E
T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:396]
Loaded runtime CuDNN library: 7501 (compatibility version 7500) but
source was compiled with 7003 (compatibility version 7000). If using
a binary install, upgrade your CuDNN library to match. If building
from sources, make sure the library loaded at runtime matches a
compatible version specified during compile configuration. 2019-10-14
01:12:58.521420: F
T:\src\github\tensorflow\tensorflow\core\kernels\conv_ops.cc:712]
Check failed: stream->parent()->GetConvolveAlgorithms(
conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
First, you need to get rid of tensorflow1.7 installation if you want to use Tensorflow-gpu, so please uninstall it completely and keep only tensorflow-gpu.
Also, as I could see in your dependencies versions, it happens to require cuDNN version 7.0 with TF1.7, which is also specified in the error raised, so downgrade your cuDNN to 7.0 from here. For all the compatible versions, you can follow this list.

Which version of tensor flow should I install?

I want to install tensorflow in Tesla K40 GPU. The CUDA version was got using
cat /usr/local/cuda/version.txt
and CuDNN version is 7.1.4.
When I referred to tensor flow documentation I couldn't see any tensorflow version suitable for my versions.
I tried installing the lower CuDNN version 5 to 6.1. I got the following error
ImportError: libcublas.so.8.O: cannot open shared object file: No such file or directory
I'm well aware that tensorflow is looking for CUDA 9.0. I cannot upgrade the CUDA as I am using a shared GPU-server. Any help is appreciated.

Categories

Resources