Install Multiple version of Cuda - python

I have an ubuntu 18.04 VM system with Cuda 10.2 already installed.
I have to run a training of a coda on a GPU, but when I run it I get some errors like:
Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64:/usr/local/cuda-10.2/lib64:
So I think I have to install Cuda 10.0.
Is it possible to have multiple version of Cuda installed? How can I add Cuda 10.0?
I want to run my training on Nvidia GPU
Edit: I succeed Installing Cuda 10.0, downloaded Cudnn 7.4.2, extracted the .tgz file in the cuda-10.0 folder. Now I got this:
I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
how can I solve this?

CUDA supports installation of multiple versions at the same time. Here is the CUDA 10.0 download archive link: https://developer.nvidia.com/cuda-10.0-download-archive
Once you have installed CUDA, you can specify for your code to look for CUDA 10.0 libraries by defining environment variable LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64.

Related

I tried to install tensor flow gpu but

I create an env in Anaconda and I try to install a package like tensorflow-gpu but I have problem with the internet because I am in Iran but after many attempts I managed to install tensorflow but when I verify gpu I get an error. I installed cuda and cudnn but when i go and check in my env its not there but in root(base env) cuda is installed. I can't reinstall cudnn and cuda in Anaconda, I don't know why but I can't.
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
When i run this code i get this errors .
PS C:\Users\sajad\OneDrive\Desktop\ai> conda activate tf
PS C:\Users\sajad\OneDrive\Desktop\ai> & C:/Users/sajad/anaconda3/envs/tf/python.exe c:/Users/sajad/OneDrive/Desktop/ai/ai.py
2023-02-02 19:02:11.499160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-02 19:02:11.526008: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2023-02-02 19:02:11.526535: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2023-02-02 19:02:15.512462: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2023-02-02 19:02:15.515914: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Please install GPU version of TF
PS C:\Users\sajad\OneDrive\Desktop\ai>

Could not load library cudart64_110.dll with tensor flow gpu installation

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
After this, there comes a traceback error which says on the last line: "from tensorflow.summary import FileWriter
ImportError: cannot import name 'FileWriter' from 'tensorflow.summary' (C:\Users\HP\tetris-ai\venv\lib\site-packages\tensorboard\summary_tf\summary_init_.py)
After installing tensoflow gpu again, I got this error
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.6.2 requires keras<2.7,>=2.6.0, but you have keras 2.7.0 which is incompatible.
tensorflow 2.6.2 requires tensorflow-estimator<2.7,>=2.6.0, but you have tensorflow-estimator 2.7.0 which is incompatible.
Successfully installed keras-2.7.0 tensorflow-estimator-2.7.0 tensorflow-gpu-2.7.0
But my issue with the dll and traceback error continued.In Vscode and in pycharm.
It could be that you need a Nvidia GPU, CUDA is the language NVIDIA uses.
You can check if you have one following these steps: Windows -> Task Manager.

Conda VE errors when installing "tensorflow"

When trying to setup tensorflow in a conda VE and I was getting a ton of errors. I have checked both here and online and it seems to be related to GPU and VM versions of tensorflow which I didnt install.
W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
I am also getting a multitude of errors such as:
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
and
I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
and also
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All at the same time
I have tried deleting and re-creating my conda enviroment and I have gotten the same error.
Details:
Python version 3.7
conda activate tensorflow
pip install python=3.7
Tensorflow version 2.6 (CPU version not GPU)
How was it fixed:
Issue was fixed by installing CPU version of tensorflow manualy. https://www.pugetsystems.com/labs/hpc/TensorFlow-Installation-CPU-version-1129/
Issue:
It was automaticaly pip installing the CUDA GPU version of tensorflow and hence wasnt working with my none CUDA enabled GPU.
If you get errors such as:
W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
or
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This may too, be your issue

tensorflow: Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

When i run my program i see this error:
I have CUDA 11 and there is not such dll file in bin folder:
I use:
Python 3.8.7
CUDA 11.0
tensorflow 2.4.1
On this page there is information https://www.tensorflow.org/install/source#gpu
tensorflow-2.4.0 Python 3.6-3.8 CUDA 11.0
that that versions are correct.
Does anyone know how I can fix this problem? What should I download or reinstall?

Could not load dynamic library 'libnvinfer.so.6'

I am trying to normally import the TensorFlow python package, but I get the following error:
Here is the text from the above terminal image:
2020-02-23 19:01:06.163940: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-02-23 19:01:06.164019: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-02-23 19:01:06.164030: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
<module 'tensorflow_core._api.v2.version' from '/home/saman/miniconda3/envs/testconda/lib/python3.7/site-packages/tensorflow_core/_api/v2/version/__init__.py'
This is a warning, not an error. You can still use TensorFlow. The shared libraries libnvinfer and libnvinfer_plugin are optional and required only if you are using nvidia's TensorRT capabilities.
To suppress this and all other warnings, set the environment variable TF_CPP_MIN_LOG_LEVEL="2".
TensorFlow's installation instructions list the GPU dependencies (current as of December 13 2022):
The following NVIDIA® software are only required for GPU support.
NVIDIA® GPU drivers version 450.80.02 or higher.
CUDA® Toolkit 11.2.
cuDNN SDK 8.1.0.
(Optional) TensorRT to improve latency and throughput for inference.
I got this warning as a result of (accidental) update of libvnifer6 package. It got updated to 6.0.1-1+cuda10.2 while original installation used 6.0.1-1+cuda10.1.
After I uninstalled packages referencing cuda10.2 and re-ran
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
this warning went away.
Most of these messages are warnings, not errors. They just mean that libraries to use an Nvidia GPU are not installed, but you don't have to have any Nvidia GPU to use Tensorflow so you don't need these libraries. The comment by jakub tells how to turn off the warnings:
export TF_CPP_MIN_LOG_LEVEL="2"
However, I too run Tensorflow without Nvidia stuff and there is one more message that is an error, not a warning:
2020-04-10 10:04:13.365696: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
It should be irrelevant because it too refers to cuda, which is for Nvidia. It doesn't seems to be a fatal error though.
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
Little bit of an info from jakub's answer. This could occur if you don't install the 'machine-learning' repo. Try this if you already install CUDA successfully but still geting the error.
Then Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
You can download the zip file of tensorRT 6 and then should paste the x86 linux folder file to /usr/lib/cuda make sure that the lib folder in the x86_linux folder that you have downloaded should be renamed to lib64 . After pasteing all the files in the cuda directory reboot the system . Now Cuda and TensorRT engine will run smoothly in your system.
I spent like 5 hrs solving this issue. For my case, I believe it means that you have the wrong version of library. libnvinfer.so.6 is located at 'TensorRT-*/lib' and the number 6 means tensorFlow is looking for the libvinfer of TensorRT6. So if it's "could not load dynamic library libnvinfer.so.5", it means that you need TensorRT 5 to run the code.
Same as above, if it is showing Could not load dynamic library 'libcudart.so.10.0', you need the library in cuda 10.0 to run the code.
So updating your tensorrt/Cuda/Cudnn to match your tensorflow version would help. Note that your tensorrt/cuda/cudnn version should also match each other.

Categories

Resources