pip does not find the cudatoolkit that conda has installed - python

I'm trying to install torch_scatter with pip. However it gives me an error message:
File "/home1/huangjiawei/miniconda3/envs/lin/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 404, in build_extensions
self._check_cuda_version()
File "/home1/huangjiawei/miniconda3/envs/lin/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 781, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (9.0) mismatches the version that was used to compile
PyTorch (11.3). Please make sure to use the same CUDA versions.
But i did install cudatoolkit by conda:
(lin) huangjiawei#ai-server-2:~/linzhijie_Weakly-supervised-Query-based-Video-Segmentation$ conda list|grep cuda
cudatoolkit 11.3.1 h2bc3f7f_2 defaults
pytorch 1.10.2 py3.8_cuda11.3_cudnn8.2.0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
pytorch-mutex 1.0 cuda https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
So it seems that pip only detected the cuda version of the sever,but didn't detected the cuda version in my enviroment.
How to fix it?

This error is complaining that your system CUDA compiler (nvcc) version doesn't match. cudatoolkit you installed in conda is CUDA runtime. These two are different components.
To install CUDA compiler, you need to install the CUDA toolkit from NVIDIA

Related

Enable Nvidia GPU CUDA support for theano and PyMC3 for Windows 11

Has anyone figured out how to install GPU support for Theano and PyMC3 on Windows 11? I keep getting an error message about a DLL load failure. Here are the steps I’ve taken:
Install MSVC C++ Build tools and add to system variable path
Install latest Nvidia drivers, CUDA toolkit, and added cuDNN .dll files to folders
Created .theanorc with the following line [global], device = cuda, floatX = float32 and added to base directory
Added 'THEANO_FLAGS' system variable with floatX=float32,device=cuda (is this redundant with step 3?)
Created the following conda environment:
conda create -n pymc_env_gpu -c conda-forge python libpython mkl-service numba python-graphviz scipy arviz pandas scikit-learn m2w64-toolchain pygpu ipykernel statsmodels
Activate new env and install pymc3 with pip install pymc3
The error message: ImportError: DLL load failed while importing m169e842c8a94977b534b5d54311bcacefb2595c2a7bb1124600448dca20d9048: The specified module could not be found.
Any help would be appreciated!

Unable to use GPU in Anaconda environment

I want to use GPU & Anaconda environment on Linux.
I'm supposed to have adapted the versions of each module, but it doesn't work.
Cuda and cuDNN are installed by using conda.
The versions of each module and driver are listed below:
・GPU:RTX 2070 SUPEER
・OS:Linux Mint 19.3 Tricia ( Ubuntu 18.04 )
・Nvidia-driver:435.21
# conda list tensorflow
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
# conda list cudnn
cudnn 7.6.5 cuda10.1_0
# conda list cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0
I can see the GPU by entering the following command
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
When I run the training script, I get the following error
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_3/convolution ......
How do I get it to work correctly?
Root cause: lack of hardware resource.
Workaround:
Fresh installed TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue.
I exited all notebooks and restarted Jupyter and open only one notebook, ran it successfully. Issue seems to be either memory or running more than one notebook on GPU
More reading here.

Is it still necessary to install CUDA before using the conda tensorflow-gpu package?

When I install tensorflow-gpu through Conda; it gives me the following output:
conda install tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/psychotechnopath/anaconda3/envs/DeepLearning3.6
added / updated specs:
- tensorflow-gpu
The following packages will be downloaded:
package | build
---------------------------|-----------------
_tflow_select-2.1.0 | gpu 2 KB
cudatoolkit-10.1.243 | h6bb024c_0 347.4 MB
cudnn-7.6.5 | cuda10.1_0 179.9 MB
cupti-10.1.168 | 0 1.4 MB
tensorflow-2.1.0 |gpu_py36h2e5cdaa_0 4 KB
tensorflow-base-2.1.0 |gpu_py36h6c5654b_0 155.9 MB
tensorflow-gpu-2.1.0 | h0d30ee6_0 3 KB
------------------------------------------------------------
Total: 684.7 MB
The following NEW packages will be INSTALLED:
cudatoolkit pkgs/main/linux-64::cudatoolkit-10.1.243-h6bb024c_0
cudnn pkgs/main/linux-64::cudnn-7.6.5-cuda10.1_0
cupti pkgs/main/linux-64::cupti-10.1.168-0
tensorflow-gpu pkgs/main/linux-64::tensorflow-gpu-2.1.0-h0d30ee6_0
I see that installing tensorflow-gpu automatically triggers the installation of the cudatoolkit and cudnn. Does this mean that I no longer need to install CUDA and CUDNN manually anymore to be able to use tensorflow-gpu? Where does this conda installation of CUDA reside?
I first installed CUDA and CuDNN the old way (e.g. by following these installation instructions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html )
And then I noticed that tensorflow-gpu was also installing cuda and cudnn
Do i now have two versions of CUDA/CuDNN installed and how do I check this?
Do i now have two versions of CUDA installed and how do I check this?
No.
conda installs the bare minimum redistributable library components required to support the CUDA accelerated packages they offer. The package name cudatoolkit is a complete misnomer. It is nothing of the sort. Even though it is now greatly expanded in scope from what it used to be (literally 5 files -- I think at some point they must have gotten a licensing deal from NVIDIA because some of this wasn't/isn't on the official "freely redistributable" list AFAIK), it still is basically just a handful of libraries.
You can check this for yourself:
cat /opt/miniconda3/conda-meta/cudatoolkit-10.1.168-0.json
{
"build": "0",
"build_number": 0,
"channel": "https://repo.anaconda.com/pkgs/main/linux-64",
"constrains": [],
"depends": [],
"extracted_package_dir": "/opt/miniconda3/pkgs/cudatoolkit-10.1.168-0",
"features": "",
"files": [
"lib/cudatoolkit_config.yaml",
"lib/libcublas.so",
"lib/libcublas.so.10",
"lib/libcublas.so.10.2.0.168",
"lib/libcublasLt.so",
"lib/libcublasLt.so.10",
"lib/libcublasLt.so.10.2.0.168",
"lib/libcudart.so",
"lib/libcudart.so.10.1",
"lib/libcudart.so.10.1.168",
"lib/libcufft.so",
"lib/libcufft.so.10",
"lib/libcufft.so.10.1.168",
"lib/libcufftw.so",
"lib/libcufftw.so.10",
"lib/libcufftw.so.10.1.168",
"lib/libcurand.so",
"lib/libcurand.so.10",
"lib/libcurand.so.10.1.168",
"lib/libcusolver.so",
"lib/libcusolver.so.10",
"lib/libcusolver.so.10.1.168",
"lib/libcusparse.so",
"lib/libcusparse.so.10",
"lib/libcusparse.so.10.1.168",
"lib/libdevice.10.bc",
"lib/libnppc.so",
"lib/libnppc.so.10",
"lib/libnppc.so.10.1.168",
"lib/libnppial.so",
"lib/libnppial.so.10",
"lib/libnppial.so.10.1.168",
"lib/libnppicc.so",
"lib/libnppicc.so.10",
"lib/libnppicc.so.10.1.168",
"lib/libnppicom.so",
"lib/libnppicom.so.10",
"lib/libnppicom.so.10.1.168",
"lib/libnppidei.so",
"lib/libnppidei.so.10",
"lib/libnppidei.so.10.1.168",
"lib/libnppif.so",
"lib/libnppif.so.10",
"lib/libnppif.so.10.1.168",
"lib/libnppig.so",
"lib/libnppig.so.10",
"lib/libnppig.so.10.1.168",
"lib/libnppim.so",
"lib/libnppim.so.10",
"lib/libnppim.so.10.1.168",
"lib/libnppist.so",
"lib/libnppist.so.10",
"lib/libnppist.so.10.1.168",
"lib/libnppisu.so",
"lib/libnppisu.so.10",
"lib/libnppisu.so.10.1.168",
"lib/libnppitc.so",
"lib/libnppitc.so.10",
"lib/libnppitc.so.10.1.168",
"lib/libnpps.so",
"lib/libnpps.so.10",
"lib/libnpps.so.10.1.168",
"lib/libnvToolsExt.so",
"lib/libnvToolsExt.so.1",
"lib/libnvToolsExt.so.1.0.0",
"lib/libnvblas.so",
"lib/libnvblas.so.10",
"lib/libnvblas.so.10.2.0.168",
"lib/libnvgraph.so",
"lib/libnvgraph.so.10",
"lib/libnvgraph.so.10.1.168",
"lib/libnvjpeg.so",
"lib/libnvjpeg.so.10",
"lib/libnvjpeg.so.10.1.168",
"lib/libnvrtc-builtins.so",
"lib/libnvrtc-builtins.so.10.1",
"lib/libnvrtc-builtins.so.10.1.168",
"lib/libnvrtc.so",
"lib/libnvrtc.so.10.1",
"lib/libnvrtc.so.10.1.168",
"lib/libnvvm.so",
"lib/libnvvm.so.3",
"lib/libnvvm.so.3.3.0"
]
.....
i.e. what you get is (keeping in mind most of those "files" above are just symlinks)
CUBLAS runtime
The CUDA runtime library
CUFFT runtime
CUrand runtime
CUsparse rutime
CUsolver runtime
NPP runtime
nvblas runtime
NVTX runtime
NVgraph runtime
NVjpeg runtime
NVRTC/NVVM runtime
The CUDNN package that conda installs is the redistributable binary distribution which is identical to what NVIDIA distribute -- which is exactly two files, a header file and a library.
You would still require a supported NVIDIA driver installation to make the tensorflow which conda installs work.
If you want to actually compile and build CUDA code, you need to install a separate CUDA toolkit which contains all the the development components which conda deliberately omits from their distribution.

get the CUDA and CUDNN version on windows with Anaconda installe

There is a tensorflow-gpu version installed on Windows using Anaconda, how to check the CUDA and CUDNN version of it? Thanks.
Use the following command to check CUDA installation by Conda:
conda list cudatoolkit
And the following command to check CUDNN version installed by conda:
conda list cudnn
If you want to install/update CUDA and CUDNN through CONDA, please use the following commands:
conda install -c anaconda cudatoolkit
conda install -c anaconda cudnn
Alternatively you can use following commands to check CUDA installation:
nvidia-smi
OR
nvcc --version
You could also run conda list from the anaconda command line:
conda list cudnn
# packages in environment at C:\Anaconda2:
#
# Name Version Build Channel
cudnn 6.0 0
Although not a public documented API, you can currently access it like this:
from tensorflow.python.platform import build_info as tf_build_info
print(tf_build_info.cuda_version_number)
# 9.0 in v1.10.0
print(tf_build_info.cudnn_version_number)
# 7 in v1.10.0
As of TensorFlow 2.4.1, We can use tensorflow.python.platform.build_info to get information on which CUDA, cuDNN the binary was built against.
>>> import tensorflow
>>> print(tensorflow.__version__)
'2.4.1'
>>> import tensorflow.python.platform.build_info as build
>>> print(build.build_info)
OrderedDict([('cpu_compiler', '/usr/bin/gcc-5'), ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '11.0'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False)])
The build.build_info is an OrderedDict. So to get CuDNN and CUDA versions:
>>> print(build.build_info['cuda_version'])
11.0
>>> print(build.build_info['cudnn_version'])
8
Note: As this is not a public API, things can change in future versions. In previous versions, we could do from tensorflow.python.platform import build_info as tf_build_info; print(tf_build_info.cuda_version_number) like in jdehesa's answer.

"import torch" giving error "from torch._C import *, DLL load failed: The specified module could not be found"

I am currently using Python 3.5.5 on Anaconda and I am unable to import torch. It is giving me the following error in Spyder:
Python 3.5.5 |Anaconda, Inc.| (default, Mar 12 2018, 17:44:09) [MSC v.1900
64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 6.2.1 -- An enhanced Interactive Python.
import torch
Traceback (most recent call last):
File "<ipython-input-1-eb42ca6e4af3>", line 1, in <module>
import torch
File "C:\Users\trish\Anaconda3\envs\virtual_platform\lib\site-
packages\torch\__init__.py", line 76, in <module>
from torch._C import *
ImportError: DLL load failed: The specified module could not be found.
Many suggestions on the internet say that the working directory should not be the same directory that the torch package is in, however I've manually set my working directory to C:/Users/trish/Downloads, and I am getting the same error.
Also I've already tried the following: reinstalling Anaconda and all packages from scratch, and I've ensured there is no duplicate "torch" folder in my directory.
Pls help! Thank you!
I had this similar problem in windows 10...
Solution:
Download win-64/intel-openmp-2018.0.0-8.tar.bz2 from https://anaconda.org/anaconda/intel-openmp/files
Extract it and put the dll files in Library\bin into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin
Make sure your cuda directory is added to your %PATH% environment variable
I had the same problem. In my case I didn't want the GPU version of pytorch.
I uninstalled it. The version was pytorch: 0.3.1-py36_cuda80_cudnn6he774522_2 peterjc123.
The problem is that cuda and cudnn . then installed with the following command and now it works!
conda install -c peterjc123 pytorch-cpu
I also encountered the same problem when I used a conda environment with python 3.6.8 and pytorch installed by conda from channel -c pytorch.
Here is what worked for me:
1:) conda create -n envName python=3.6 anaconda
2:) conda update -n envName conda
3:) conda activate envName
4:) conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
and then tested torch with the given code:
5:) python -c "import torch; print(torch.cuda.get_device_name(0))"
Note: 5th step will return your gpu name if you have a cuda compatible gpu
Summary: I just created a conda environment containing whole anaconda and then to tackle the issue of unmatched conda version I updated conda of new environment from the base environment and then installed pytorch in that environment and tested pytorch.
For CPU version, here is the link for my another answer: https://gist.github.com/peterjc123/6b804651288e76db7b5fabe5348e1f03#gistcomment-2842825
https://gist.github.com/peterjc123/6b804651288e76db7b5fabe5348e1f03#gistcomment-2842837
Had the same problem and fixed it by re-installing numpy with mkl (Intel's math kernel library)
https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Download the right .whl for your machine. For me it was numpy‑1.14.5+mkl‑cp36‑cp36m‑win_amd64.whl (python 3.6, windows, 64-bit)
and then install using pip.
pip install numpy‑1.14.5+mkl‑cp36‑cp36m‑win_amd64.whl
I am using a Windows 10 computer with an NVIDIA GeForce graphics card. NVIDIA showed I had CUDA 10.1, but I was getting this error when running import torch in Jupyter Lab and suspected it had something to do with CUDA support.
I fixed this problem by downloading and installing the CUDA Toolkit directly from NVIDIA. It installed all required Visual Studio components. When I returned to Jupyter Lab, import torch ran without error.
Make sure you installed the right version of pytorch for your enviroment. I had the same problem I was using pytorch on windows but I had the default package installed which was meant for cuda 8. So I reinstalled the pytorch package for cpu which was what I needed.
I had the same issue with running torch installed with pure pip and solved it by switching to conda.
Following steps:
uninstall python 3.6 from python.org (if exists)
install miniconda
install torch in conda ("conda install pytorch -c pytorch")
Issue with pip installation:
import torch
File "C:\Program Files\Python35\lib\site-packages\torch\__init__.py", line 78, in <module>
from torch._C import *
ImportError: DLL load failed: The specified module could not be found.
After switching to conda it works fine. I believe the issue was resolved by conda through installing the vs_redist 2017
vs2017_runtime 15.4.27004.2010 peterjc123
But I have tried it w/o conda and it did not help. Could not find how to check (and tweak) Python's vs_redist.
Windows10 Solution(This worked for my system):
I was having the same issue in my system. Previously I was using Python 3.5 and I created a virtual environment named pytorch_test using the virtualenv module because I didn't want to mess up my tensorflow installation(which took me a lot of time). I followed every instruction but it didn't seem to work. I installed python 3.6.7 added it to the path. Then I created the virtual environment using:
virtualenv --python=3.6 pytorch_test
Then go to the destination folder
cd D:\pytorch_test
and activate the virtual environment entering the command in cmd:
.\Scripts\activate
After you do this the command prompt will show:
(pytorch_test) D:\pytorch_test>
Update pip if you have not done it before using:
(pytorch_test) D:\pytorch_test>python -m pip install --upgrade pip
Then go for installing numpy+mkl from the site:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Choose the correct version from the list if you have python 3.6.7 go with the wheel file:
numpy‑1.15.4+mkl‑cp36‑cp36m‑win_amd64.whl (For 64 bit)
(Note if the whole thing doesnot work just go with simple numpy installation and mkl installation separately)
Then go for installing openmp using:
(pytorch_test) D:\pytorch_test>pip install intel-openmp
Now you are done with the prerequisites. To install pytorch go to the previous versions site:
https://pytorch.org/get-started/previous-versions/
Here select the suitable version from the list of Windows Binaries. For example I am having CUDA 9.0 installed in my system with python 3.6.7 so I went with the gpu version:
cu90/torch-1.0.0-cp36-cp36m-win_amd64.whl
(There are two available versions 0.4.0 and 1.0.0 for pytorch, I went with 1.0.0)
After downloading the file install it using pip(assuming the whl file is in D:).You have to do this from the virtual environment pytorch_test itself:
(pytorch_test) D:\pytorch_test>pip install D:\torch-1.0.0-cp36-cp36m-win_amd64.whl
Prerequisites like six, pillow will be installed automatically.
Then once everything is done, install the models using torchvision.
Simply type :
(pytorch_test) D:\pytorch_test>pip install torchvision
To check everything is working fine try the following script:
import torch
test = torch.rand(4, 7)
print(test)
If everything was good then it wont be an issue. Whenever there is an issue like this it is related to version mismatch of one or more dependencies. This also occurred during tensorflow installation.
Deactivate the following virtual environment using the command deactivate in the cmd:
(pytorch_test) D:\pytorch_test>deactivate
This is the output of pip list in my system:
Package Version
------------ -----------
intel-openmp 2019.0
mkl 2019.0
numpy 1.16.2
Pillow 6.0.0
pip 19.0.3
setuptools 41.0.0
six 1.12.0
torch 1.0.0
torchvision 0.2.2.post3
wheel 0.33.1
Hope this helps. This is my first answer in this community, hope you all find it helpful. I setup pytorch today in the afternoon after trying all sorts of combinations. The same import problem occurred to me while installing CNTK and tensorflow. Anyway I kept them separate in different virtual environments so that I can use them anytime.

Categories

Resources