I want to install tensorflow in Tesla K40 GPU. The CUDA version was got using
cat /usr/local/cuda/version.txt
and CuDNN version is 7.1.4.
When I referred to tensor flow documentation I couldn't see any tensorflow version suitable for my versions.
I tried installing the lower CuDNN version 5 to 6.1. I got the following error
ImportError: libcublas.so.8.O: cannot open shared object file: No such file or directory
I'm well aware that tensorflow is looking for CUDA 9.0. I cannot upgrade the CUDA as I am using a shared GPU-server. Any help is appreciated.
Related
The PyTorch website says that PyTorch 1.12.1 is compatible with CUDA 11.6, but I get the following error:
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
I am using a laptop RTX 3060 and Poetry as my package manager in Python.
>>> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
>>> poetry show
certifi 2022.9.24 Python package for providing Mozilla's CA Bundle.
charset-normalizer 2.1.1 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
idna 3.4 Internationalized Domain Names in Applications (IDNA)
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
opencv-contrib-python 4.6.0.66 Wrapper package for OpenCV python bindings.
opencv-python 4.6.0.66 Wrapper package for OpenCV python bindings.
pillow 9.2.0 Python Imaging Library (Fork)
requests 2.28.1 Python HTTP for Humans.
torch 1.12.1 Tensors and Dynamic neural networks in Python with strong GPU acceleration
torchvision 0.13.1 image and video datasets and models for torch deep learning
typing-extensions 4.4.0 Backported and Experimental Type Hints for Python 3.7+
urllib3 1.26.12 HTTP library with thread-safe connection pooling, file post, and more.
What am I missing here? Is this a PyTorch <> CUDA issue or a CUDA <> GPU issue?
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not
compatible with the current PyTorch installation. The current PyTorch
install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
The build of PyTorch which you have installed doesn't have binary support for your GPU. This is because whoever built the PyTorch you are using chose to build it like that. This isn't a question of CUDA versions or PyTorch versions. It just that many frameworks are built with a limited range of binary architectures in order to keep the size of the packages they distribute small.
NVIDIA provide a method to support forward compatible architectures running older code through JIT recompilation at runtime. Unfortunately the standard PyTorch build system doesn't use it in order to save space in their build distributions, so that cannot help you in this situation.
Your only solution is to either source another build with the appropriate binary support for your GPU included.
I have been trying to enable CUDA to run PyMC3 with the assistance of the GPU. Here are the specs of the machine/software I have been using:
Windows 10
Visual Studio Community 2019
Python 3.8.12
CUDA 10.2 (I tried 11.2 before that and obtained the same problem)
CuDNN 7.6.5 (I tried 8.1 with CUDA 11.2 and obtained the same problem)
TensorFlow 2.7.0
Theano-PyMC 1.1.2
Aesara 2.3.2 (the successor to Theano)
PyMC3 3.11.4
MKL 2.4.0
For the proper installation of Theano and CUDA in a Windows environment, I followed the advice provided on these web pages:
https://gist.github.com/ElefHead/93becdc9e99f2a9e4d2525a59f64b574
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
I have tested the installation against Tensorflow and it works. I have also used the tests provided on the Theano and Aesara "Read the Docs" sites (https://aesara.readthedocs.io/en/latest/tutorial/using_gpu.html#testing-the-gpu) and ran the check_blas test provided with Theano/Aesara (https://raw.githubusercontent.com/Theano/Theano/master/theano/misc/check_blas.py). After all this, I still get these disappointing error/warning messages:
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
UserWarning: Your cuDNN version is more recent than Aesara. If you encounter problems, try updating Aesara or downgrading cuDNN to a version >= v5 and <= v7
even though I have already downgraded cuDNN to 7.6.5 (and, obviously, can't use the GPU with Theano/Aesara/PyMC3).
With respect to the BLAS warning, I tried setting the blas__ldflags (Aesara) or blas.ldflags (Theano) as environment variables, assigning them the recommended MKL values -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -liomp5 -lmkl_mc -lpthread, still nothing works.
Can anybody please help me address these two issues?
I am having some issues with Tensorflow, that seems not to detect my GPU.
When running some code using Tensorflow, I get the error:
tensorflow/stream_executor/cuda/cuda_driver.cc:328]
failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Here's my config:
Nvidia GeForce RTX 3080 Ti
Ubuntu 18.04
CUDA 11.4, driver 470.57.02
Tensorflow 2.5
My GPU is well detected (checked it with nvidia-smi) and tf.test.is_gpu_available() returns True.
I tried downgrading the CUDA version and the driver but nothing changed.
Does anybody has some hints on how to solve this? Thanks a lot!
You would need to install a package built with the same CUDA environment to ensure compatibility. Tensorflow 2.5 is compatible with CUDA 11.2.
Take a look at Tested build configuration
The issue occurs due to TensorFlow 2.5 is compatible with. So, just downgrade (re-install) your CUDA to 11.2.
https://developer.nvidia.com/cuda-11.2.0-download-archive
I want to use GPU & Anaconda environment on Linux.
I'm supposed to have adapted the versions of each module, but it doesn't work.
Cuda and cuDNN are installed by using conda.
The versions of each module and driver are listed below:
・GPU:RTX 2070 SUPEER
・OS:Linux Mint 19.3 Tricia ( Ubuntu 18.04 )
・Nvidia-driver:435.21
# conda list tensorflow
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
# conda list cudnn
cudnn 7.6.5 cuda10.1_0
# conda list cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0
I can see the GPU by entering the following command
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
When I run the training script, I get the following error
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_3/convolution ......
How do I get it to work correctly?
Root cause: lack of hardware resource.
Workaround:
Fresh installed TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue.
I exited all notebooks and restarted Jupyter and open only one notebook, ran it successfully. Issue seems to be either memory or running more than one notebook on GPU
More reading here.
I tried to run Keras with my GPU but got the following error:
C:\Python36\lib\site-packages\skimage\transform_warps.py:84:
UserWarning: The default mode, 'constant', will be changed to
'reflect' in skimage 0.15. warn("The default mode, 'constant', will
be changed to 'reflect' in " E
C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:378]
Loaded runtime CuDNN library: 7102 (compatibility version 7100) but
source was compiled with 7003 (compatibility version 7000). If using
a binary install, upgrade your CuDNN library to match. If building
from sources, make sure the library loaded at runtime matches a
compatible version specified during compile configuration.
F
C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717]
Check failed: stream->parent()->GetConvolveAlgorithms(
conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
I have tensorflow 1.6, CUDA version: Cuda compilation tools, release 9.0, V9.0.176
Does anyone know whats wrong here?
You need to install cuDNN 7.0.5. The file can be downloaded here. After clicking Download and agreeing to the terms, the option will be listed.