AssersionError: Torch not compiled with CUDA enabled - python

I want to run this repo. I installed everything that is needed for this project.
I have Windows 8.1 operating system, seems that I don't have NVIDIA GPU (from Device Manager: Display adapters - AMD Radeon HD 7660G + 7670M Dual Graphics and AMD Radeon HD 7670M).
I installed torch with command that is presented on Pytorch web-site
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
But when I run the project I receive error - AssersionError: Tourch not compiled with CUDA enabled.
Then I tried to install torch with CUDA enabled.
pip install torch===1.6.0 torchvision===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
But when I run the project I receive error - AssersionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from https://www.nvidia.com/Download/index.aspx.
Please, help me to solve my issue and run the project without errors.

I have already fixed this issue. There was a problem in source code where they use
opt.device
in main.py and violin_dataset.py. But this was declared as
opt.device = torch.device('cuda:0')
in config.py even if you didn't have cuda support.
So I changed it to
opt.device = torch.device('cpu')
And everything works fine now.

Related

jax woes (on an NVDIA DGX box, no less)

I am trying to run jax on an nvidia dgx box, but am failing miserably, thus:
>>> import jax
>>> import jax.numpy as jnp
>>> x = jnp.arange(10)
2021-10-25 13:00:05.863667: W
external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't
get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2021-10-25 13:00:05.864713: F
external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:435]
ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to
launch ptxas' If the error message indicates that a file could not be written,
please verify that sufficient filesystem space is provided.
Aborted (core dumped)
Any suggestions would be much appreciated.
This means that your CUDA installation is not configured correctly, and can generally be fixed by ensuring that the CUDA toolkit binaries (including ptxas) are present in your $PATH. See https://github.com/google/jax/discussions/6843 and https://github.com/google/jax/issues/7239 for responses to users reporting similar issues.
For this problem you need to install nvidia-driver, cuda and cudnn correctly and the risky command here would be: sudo apt install nvidia-cuda-toolkit avoid this command if you have installed those 3 already.
the way which works for me:
Install nvidia-driver: follow this and proper version also. you can try sudo ubuntu-drivers devices in ubuntu
Install cuda : for finding which cuda version works for you run nvidia-smi and on top-left you will see compatible version for the cuda then go nvidia cuda archive and follow the instructions there.
at this step you should be able to see cuda foder when you type ls /usr/local. if you want to install header also you can find useful command from nvidia installation guide for cuda.
Install cudnn which means copy paste some files into /usr/local/cuda directory if you go through cuDNN nvidia guide you would find the best way.
the last step you need to refer to the cuda path (/usr/local/cuda if you follow above). for example if you use docker you need to mount it like here. avoid install nvidia-cuda-toolkit it would remove your previous installation and instead you can install it in conda-env by conda install -c nvidia cuda-nvcc which doesn't interfere your cuda installation.

Getting Torch to recognize GPU

How do you get Torch to recognize CUDA on your video card?
I have a Nvidia GeForce GT 1030 running under Ubuntu 18.04, and it claims to support CUDA, yet when I first tested Torch with it by running:
virtualenv -p python3.7 .env
. .env/bin/activate
pip install torch
python -c "import torch; print(torch.cuda.is_available())"
it returned False, along with the warning:
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
So I ran all system updates and used Ubuntu's proprietary driver installer to install the most recent Nvidia-435 driver for my card.
However, torch.cuda.is_available() still returns false, but now it doesn't give me any warning.
Have I mis-configured Torch or does my GPU just not support CUDA?
Nevermind. I spoke too soon. I didn't reboot after switching over the driver, and apparently that broke nvidia-smi and some other things that loaded the CUDA driver. After the reboot, Torch now recognizes CUDA 10.1 support.
Yeah I checked this link and the GT 1030 is not compatible.

Do I need to install CUDA driver for tensorflow-gpu manually if I install tf through conda

I followed this tutorial and installed tf-gpu using conda (https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187/) and it worked because I am seeing "...gpu:0" in my printed out log. Before I did the installation, I already have CUDA driver installed, so I am not sure.
Seems to me that conda install tensorflow-gpu comes with cuda toolkit and cuDNN,etc. I was wondering if installing CUDA driver is a require step. Another post I found did't mention driver either (https://towardsdatascience.com/tensorflow-gpu-installation-made-easy-use-conda-instead-of-pip-52e5249374bc). But the official GPU guide says it's required, so I am confused. I am doing it on Windows 10.
In my experience you do not need to install cuda or cudnn. Just your graphics driver is enough.
But depending on your system it might not be optimized. For that you would need to compile tensorflow from scratch and optimize it for your system.
Depends on the machine you are running on. For example, you can configure a Google Deep Learning VM to install the NVIDIA driver on startup.
If the driver is not installed, then follow the Tensorflow instructions on how to install the NVIDIA driver. Here are the instructions for Linux. Note that you only need to install the driver, and not the toolkit.

Error with tensorFlow

I have some problem with tensorFlow. I'm trying to install it with GPU on my manjaro linux with GTX 1060.
When I try to import tensorFlow in python with:
import tensorflow as tf
I get this error:
{...} ImportError: libcublas.so.8.0: cannot open shared object file:
No such file or directory {...}
With pip, I have installed tensorFlow-gpu:sudo pip install tensorflow-gpu
When I try to install cuda-8.0 (with pacaur -Syu cuda-8.0), after a very long loading, I got an error. Now when I try to install it, it does this:
Errors occurred, no packages were upgraded
Even if it's not on my pacaur list, and there is no reinstalling signed
I have install Keras with: sudo pip install Keras
I have install cudNN with: pacaur -Syu cudnn
I have installed my nvidia driver with (if I remember it right):pacaur -Syu nvidia
I am not familiar with manjaro. Assume you wanna install TensorFlow 1.4, the order would be:
Install latest Nvidia driver (version 384.xx or higher). Check its status in a terminal with nvidia-smi.
Install CUDA 8.0 without the GPU driver (as you have done it in step 1).
Add PATH=/usr/local/cuda-8.0/bin to the environment (in Ubuntu it's /etc/environment).
Added driver and CUDA paths to LD_LIBRARY_PATH. In Ubuntu, it is done by adding export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda/lib64:/usr/lib/nvidia-384:/usr/local/cuda/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} to /etc/bash.bashrc. At this point, you should be able to check CUDA version by nvcc --version.
Copy CUDNN files to somewhere and add that path to LD_LIBRARY_PATH. CUDNN needs no installation.
Install TensorFlow 1.4.
If you wanna install other versions of TensorFlow, you need to first check the supported versions of CUDA and CUDNN.
Hope this helps.

Tensorflow 0.7.1 with Cuda Toolkit 7.5 and cuDNN 7.0

I recently tried to upgrade my Tensorflow installation from 0.6 to 0.7.1 (Ubuntu 15.10, Python 2.7) because it is described to be compatible with more up-to-date Cuda libraries. Everything works well including the simple test from the Tensorflow getting started page. However I'm not able to use cuDNN. When running a program using cuDNN, I first get a warning
"Unable to load cuDNN DSO"
and later the program crashes with
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 3.30GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x704a80000 extends to 0x7d80c8000
F tensorflow/stream_executor/cuda/cuda_dnn.cc:204] could not find cudnnCreate in cudnn DSO; dlerror: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate
The files I downloaded for the Cuda Installation were
cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
and
cudnn-7.0-linux-x64-v4.0-prod.tgz
I followed the instructions on the Tensorflow getting started page with the exception of using cuDNN 7.0 instead of 6.5. $LD_LIBRARY_PATH is
"/usr/local/cuda/lib64"
I have no clue why cudnnCreate is not found. Is there somebody who has successfully installed this configuration and can give me advice?
I get the same error when I forgot to set the LD_LIBRARY_PATH and CUDA_HOME environment variables:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
I am following this instructions to install TensorFlow in archlinux:
https://github.com/ddigiorg/AI-TensorFlow/blob/master/install/install-TF_2016-02-27.md
It seems you need cuDNN v2 or above, which you can get by register for their Accelerated Computing Developer Program, which usually takes 2 days:
https://developer.nvidia.com/accelerated-computing-developer
UPDATE: It seems you already have cuDNNv2
The link sent by jorgemf (thank you) describes a Python 3.5 installation and I almost switched to Python 3.5.
My last attempt with my present installation was to again copy the cuDNN libraries to /usr/local/cuda/lib64.
And it worked! So the problem is solved, although I still don't know why I had it.
Errorsolving for windows 10 users:
Download cuDNN v5.1 Library for Windows 10 from the cuda site,
register if necessary.
Copy the cudnn64_5.dll (cuda\bin\cudnn64_5.dll) from that zip
archive into
C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v8.0\bin\;
If C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 is your install PATH
for the CUDA toolkit.
Ubuntu 14.04 && cudnnV5.0 && Cuda7.5
I got the some error and solve it in another way.
Follow the official get-started page, I install the cudnn with these commands below, which is basically just copy those files into our cuda directory
https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#optional-install-cuda-gpus-on-linux
tar xvzf cudnn-7.5-linux-x64-v5.1-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
But after doing this ,if we use ll command to show all the file in "/usr/local/cuda/lib64" and compare with the origin files
ll
it seems that those soft links has broken after copy.
so I delete them and create manually, like this:
sudo rm libcudnn.so.5 libcudnn.so
sudo ln -sf libcudnn.so.5 libcudnn.so
sudo ln -sf libcudnn.so.5.1.3 libcudnn.so.5
after that, execute
sudo ldconfig /usr/local/cuda/lib64
and it finally worked!

Categories

Resources