Persisting CUDA error with tensorflow in WSL - python

I'm trying to make tensorflow use my NVIDIA GTX 1060 gpu in my laptop. I created a python environment and installed tensorflow, python, pip, etc. I am using Ubuntu on Windows (so wsl-ubuntu). On CMD, the nvidia-smi command is showing my GPU. But with tensorflow, I get the following error:
2022-01-26 21:45:36.677191: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-01-26 21:45:36.678074: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-P8QAQC0): /proc/driver/nvidia/version does not exist
Num GPUs Available: 0
I have CUDA 11.5 and 11.6 installed, with cudNN 8.3.2.44 installed. I manually copied and pasted the files into the CUDA directory and ran the exe (exe didn't seem to install files though). I am not sure what else to do. Help would be really appreciated!
EDIT: I'm on Windows 10, and I changed my CUDA installation to 11.2 and cuDNN 8.1. The issue is still there. Both are installed on my C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA. I'm not sure if that's the error, since I didn't install directly on WSL.

Related

getting tensorflow to run on GPU

I've been trying to get this to work forever and still no luck
I have:
GTX 1050 Ti (on Lenovo Legion laptop)
the laptop also has an Intel UHD Graphics 630 (i'm not sure if maybe this is interfering?)
Anaconda
Visual Studio
Python 3.9.13
CUDA 11.2
cuDNN 8.1
I added these to the PATH:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp
finally I installed tensorflow and created its own environment
and I still can't get it to read my GPU
basically followed https://www.youtube.com/watch?v=hHWkvEcDBO0&t=295s
AND I'm still having no luck.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
yields only information on the CPU
Can anyone please help?
You can upgrade tensorflow to 2.0. It should solve your problem.
Check your tensorflow version and compatability with GPU, update your GPU drivers. CUDA 9/10 would do the job.
follow the official tensorflow link:
https://www.tensorflow.org/install/pip#windows-native_1
Do all the steps in the same environment in anaconda.

Tensorflow: dlerror: cudnn64_8.dll not found but it appears to exist

I know this seems to be a common question but I cant seem to find a thread specific to my issue.
I am running windows 10, with a gtx 1050.
I am trying to install tensorflow 2.5 according to this tutorial: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#tf-install
I've installed CUDA 11.2 and CuDNN 8.1.0.
I've installed the correct CuDNN version and CUDA according to the tutorial and my computer settings, and I've checked that I have the cudnn64_8.dll in my cuda folder: D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\cuda\bin
I'm running a python venv for tensorflow.
I've also made sure that the PATHS are updated, and I've also restarted my terminal and computer.
Im confused as to why the .dll file is unable to be found.

jupyter notebook InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

everyone. I am going to run mask RCNN recently. After installing CUDA + cudnn + tensorflowgpu for win10, I found that pycharm can run the program, but in Anaconda Jupiter notebook, an error was reported internalerror: cudagetdevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version. I checked my CUDA toolkit and other versions and found no problem.
graphics card NVIDIA GTX950M
graphics driver 442.19
CUDA 9.0
Cudnn 7
tensorflow-gpu 1.9.0
I suspect it's Anaconda's environmental problem, and I find CUDA's version a bit confusingenter image description here
I hope someone can help me. I really like notebook

Getting Torch to recognize GPU

How do you get Torch to recognize CUDA on your video card?
I have a Nvidia GeForce GT 1030 running under Ubuntu 18.04, and it claims to support CUDA, yet when I first tested Torch with it by running:
virtualenv -p python3.7 .env
. .env/bin/activate
pip install torch
python -c "import torch; print(torch.cuda.is_available())"
it returned False, along with the warning:
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
So I ran all system updates and used Ubuntu's proprietary driver installer to install the most recent Nvidia-435 driver for my card.
However, torch.cuda.is_available() still returns false, but now it doesn't give me any warning.
Have I mis-configured Torch or does my GPU just not support CUDA?
Nevermind. I spoke too soon. I didn't reboot after switching over the driver, and apparently that broke nvidia-smi and some other things that loaded the CUDA driver. After the reboot, Torch now recognizes CUDA 10.1 support.
Yeah I checked this link and the GT 1030 is not compatible.

tensorflow transition to gpu version

i've worked with tensorflow for a while and everything worked properly until i tried to switch to the gpu version.
Uninstalled previous tensorflow,
pip installed tensorflow-gpu (v2.0)
downloaded and installed visual studio community 2019
downloaded and installed CUDA 10.1
downloaded and installed cuDNN
tested with CUDA sample "deviceQuery_vs2019" and got positive result.
test passed
Nvidia GeForce rtx 2070
run test with previous working file and get the error
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
after some research i've found that the supported CUDA version is 10.0
so i've downgraded the version, changed the CUDA path, but nothing changed
using this code
import tensorflow as tf
print("Num GPUs Available: ",
len(tf.config.experimental.list_physical_devices('GPU')))
i get
2019-10-01 16:55:03.317232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-01 16:55:03.420537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
Num GPUs Available: 1
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-10-01 16:55:03.421029: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-01 16:55:03.421849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[Finished in 2.01s]
CUDA seems to recognize the card, so does tensorflow, but i cannot get rid of the error:
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
what am i doing wrong? should i stick with cuda 10.0? am i missing a piece of the installation?
SOLVED, it's mostly an alchemy of versions to avoid conflicts.
Here's what i've done (order matters as far as i know)
uninstall everything (tf, cuda, visual studio)
pip install tensorflow-gpu
download and install visual studio community 2017 (2019 won't work)
I also have installed the c++ workload from visual studio (not sure if it's necessary but it has the required compiler visual c++ 15.x)
download and install cuda 10.0 (the one i have is 10.0.130)
go to system environment variables (search it in the windows bar) > advanced > click Environment Variables...
create New user variables (do not confuse with system var)
Variable name: CUDA_PATH,
Variable value: browse to the cuda directory down to the version directory (mine is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0)
the guide says you need cudnn 7.4.1, but i got an error about expected version being 7.6 minimum. go to the nvidia developers cudnn archive and download "cudnn v7.6.0 for CUDA 10.0" (be sure you get the right file). unzip, put the cudnn files into the corresponding cuda directories (lib, include, bin).
From there everything worked like a charm. I haven't been able to build the cuda sample file from visual studio (devicequery) but it's not a vital step.
Almost every error was due to incompatible versions of the files, took me 3-4 days to figure the right mix. Hope that help :)
tensorflow-gpu v2.0.0 is now available on conda, and is very easy to install with:
conda install -c anaconda tensorflow-gpu. No additional downloads or cuda installs required.
i had similar problems.
combined with the fact that i am using windows 8 and pycharm. BUt i figured it out eventually using this post.
the combination that worked:
Cuda 10
CuDNN 7.6 for windows7
Tensorflow-gpu 2.0
then using the path environment variable as described above.
Important is to restart after setting environment variables ;)
i did not think that tensorflow 2.2. would not be able to use cuda 11...

Categories

Resources