I want to use GPU & Anaconda environment on Linux.
I'm supposed to have adapted the versions of each module, but it doesn't work.
Cuda and cuDNN are installed by using conda.
The versions of each module and driver are listed below:
・GPU:RTX 2070 SUPEER
・OS:Linux Mint 19.3 Tricia ( Ubuntu 18.04 )
・Nvidia-driver:435.21
# conda list tensorflow
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
# conda list cudnn
cudnn 7.6.5 cuda10.1_0
# conda list cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0
I can see the GPU by entering the following command
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
When I run the training script, I get the following error
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_3/convolution ......
How do I get it to work correctly?
Root cause: lack of hardware resource.
Workaround:
Fresh installed TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue.
I exited all notebooks and restarted Jupyter and open only one notebook, ran it successfully. Issue seems to be either memory or running more than one notebook on GPU
More reading here.
Related
I have been trying to enable CUDA to run PyMC3 with the assistance of the GPU. Here are the specs of the machine/software I have been using:
Windows 10
Visual Studio Community 2019
Python 3.8.12
CUDA 10.2 (I tried 11.2 before that and obtained the same problem)
CuDNN 7.6.5 (I tried 8.1 with CUDA 11.2 and obtained the same problem)
TensorFlow 2.7.0
Theano-PyMC 1.1.2
Aesara 2.3.2 (the successor to Theano)
PyMC3 3.11.4
MKL 2.4.0
For the proper installation of Theano and CUDA in a Windows environment, I followed the advice provided on these web pages:
https://gist.github.com/ElefHead/93becdc9e99f2a9e4d2525a59f64b574
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
I have tested the installation against Tensorflow and it works. I have also used the tests provided on the Theano and Aesara "Read the Docs" sites (https://aesara.readthedocs.io/en/latest/tutorial/using_gpu.html#testing-the-gpu) and ran the check_blas test provided with Theano/Aesara (https://raw.githubusercontent.com/Theano/Theano/master/theano/misc/check_blas.py). After all this, I still get these disappointing error/warning messages:
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
UserWarning: Your cuDNN version is more recent than Aesara. If you encounter problems, try updating Aesara or downgrading cuDNN to a version >= v5 and <= v7
even though I have already downgraded cuDNN to 7.6.5 (and, obviously, can't use the GPU with Theano/Aesara/PyMC3).
With respect to the BLAS warning, I tried setting the blas__ldflags (Aesara) or blas.ldflags (Theano) as environment variables, assigning them the recommended MKL values -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -liomp5 -lmkl_mc -lpthread, still nothing works.
Can anybody please help me address these two issues?
I am trying to run tensorflow-gpu and it is working in python shell
>>> import tensorflow as tf
>>> if tf.test.gpu_device_name():
... print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
... else:
... print("Please install GPU version of TF")
Output:
Default GPU Device:/device:GPU:0
But When I run the same code in Jupyter, I get this:
Output:
Please install GPU version of TF
https://i.imgur.com/tP4uHzA.png
I am using Anaconda, I installed tensorflow-gpu using both conda installer and pip3/pip but does not work in jupyter. Anybody know what is wrong here? I have installed Cuda toolkit,cudnn, Nvidia drivers are also up to date.
I'm relatively new working with Tensorflow and Keras, and I want to switch the process for my ANN with GPU. I tried some tutorials on YouTube, and it didn't work for me. So, I tried the simplest one that I found, and I attach the link below.
Video tutorial on Youtube
I also changed some of the installed packages in the Anaconda environment because some guides told me that it could be some conflicting versions of cuDNN and CUDA. Here are some installed packages in the environment.
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
keras-applications 1.0.8 py_0
keras-preprocessing 1.1.0 py_1
tensorflow 1.14.0 gpu_py36h305fd99_0
tensorflow-base 1.14.0 gpu_py36h55fc52a_0
tensorflow-estimator 1.14.0 py_0
tensorflow-gpu 1.14.0 h0d30ee6_0
When I run the command on IPython Console (from Spyder), this message alway appears after I run this command.
In [2]: from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Have you updated your GPU driver? See this table for the minimum version.
The easier way to run TensorFlow on your GPU is to use the container from NGC: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
I've updated the torch version in my SageMaker pytorch_36 kernel to torch version 1.0.0. I then tried running the example notebook pytorch_torchvision_neo.ipynb, also changing the framework_version to 1.0.0. Neo compilation then fails.
Any idea why it isn't working with 1.0.0? The console error message actually tells me to make sure I'm using 1.0.0, but the example notebook seems to only work with 0.4.0.
Sagemaker notebook has pytorch-1.1.0 pre-installed.
But Model Compilation service expects model saved by pytorch-0.4.0 or pytorch-1.0.1
Solution to the issue:
# 1. do not install `pytorch-cpu` and `torchvision-cpu`.
# 2. Downgrade pytorch version to 1.0.1
!conda install -y pytorch=1.0.1 -c pytorch
# 3. import pytorch and check that version is 1.0.1 (but not 1.1.0)
import torch
torch.__version__
Continue to run notebook steps: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb
I want to install tensorflow in Tesla K40 GPU. The CUDA version was got using
cat /usr/local/cuda/version.txt
and CuDNN version is 7.1.4.
When I referred to tensor flow documentation I couldn't see any tensorflow version suitable for my versions.
I tried installing the lower CuDNN version 5 to 6.1. I got the following error
ImportError: libcublas.so.8.O: cannot open shared object file: No such file or directory
I'm well aware that tensorflow is looking for CUDA 9.0. I cannot upgrade the CUDA as I am using a shared GPU-server. Any help is appreciated.