Tensorflow-GPU running on python shell but not in jupyter - python

I am trying to run tensorflow-gpu and it is working in python shell
>>> import tensorflow as tf
>>> if tf.test.gpu_device_name():
... print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
... else:
... print("Please install GPU version of TF")
Output:
Default GPU Device:/device:GPU:0
But When I run the same code in Jupyter, I get this:
Output:
Please install GPU version of TF
https://i.imgur.com/tP4uHzA.png
I am using Anaconda, I installed tensorflow-gpu using both conda installer and pip3/pip but does not work in jupyter. Anybody know what is wrong here? I have installed Cuda toolkit,cudnn, Nvidia drivers are also up to date.

Related

using a tensorflow model trained on google colab on my PC

I am using colab to train a tensorflow model. I see that google colab installs the following version by default:
import tensorflow
tensorflow.__version__
2.6.0
...
[train model]
...
model.save('mymodel.h5')
However, when I download the model to my windows pc and try to load it with tensorflow/keras, I get an error
import keras
import tensorflow
model = keras.models.load_model(r"mymodel.h5")
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
After searching on the net, it appears this is due to the different tensorflow versions (colab vs. my PC).
tensorflow.__version__
Out[4]: '2.1.0'
The problem is that when I install tensorflow with conda install tensorflow-gpu this is the version I get. Even trying to force conda install tensorflow-gpu==2.6 does not install anything.
What should I do?
Thanks!
hacky solution for now...
download tensorflow 2.1 + CUDA and CuDNN using conda install tensorflow-gpu
upgrade using pip install tensorflow-gpu==2.6 --upgrade --force-reinstall
The GPU does not work (likely because the CUDA versions are not the right ones) but at least I can run a tf 2.6 script using the CPU.

Unable to use GPU in Anaconda environment

I want to use GPU & Anaconda environment on Linux.
I'm supposed to have adapted the versions of each module, but it doesn't work.
Cuda and cuDNN are installed by using conda.
The versions of each module and driver are listed below:
・GPU:RTX 2070 SUPEER
・OS:Linux Mint 19.3 Tricia ( Ubuntu 18.04 )
・Nvidia-driver:435.21
# conda list tensorflow
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
# conda list cudnn
cudnn 7.6.5 cuda10.1_0
# conda list cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0
I can see the GPU by entering the following command
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
When I run the training script, I get the following error
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_3/convolution ......
How do I get it to work correctly?
Root cause: lack of hardware resource.
Workaround:
Fresh installed TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue.
I exited all notebooks and restarted Jupyter and open only one notebook, ran it successfully. Issue seems to be either memory or running more than one notebook on GPU
More reading here.

SageMaker Neo PyTorch 1.0.0

I've updated the torch version in my SageMaker pytorch_36 kernel to torch version 1.0.0. I then tried running the example notebook pytorch_torchvision_neo.ipynb, also changing the framework_version to 1.0.0. Neo compilation then fails.
Any idea why it isn't working with 1.0.0? The console error message actually tells me to make sure I'm using 1.0.0, but the example notebook seems to only work with 0.4.0.
Sagemaker notebook has pytorch-1.1.0 pre-installed.
But Model Compilation service expects model saved by pytorch-0.4.0 or pytorch-1.0.1
Solution to the issue:
# 1. do not install `pytorch-cpu` and `torchvision-cpu`.
# 2. Downgrade pytorch version to 1.0.1
!conda install -y pytorch=1.0.1 -c pytorch
# 3. import pytorch and check that version is 1.0.1 (but not 1.1.0)
import torch
torch.__version__
Continue to run notebook steps: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb

get the CUDA and CUDNN version on windows with Anaconda installe

There is a tensorflow-gpu version installed on Windows using Anaconda, how to check the CUDA and CUDNN version of it? Thanks.
Use the following command to check CUDA installation by Conda:
conda list cudatoolkit
And the following command to check CUDNN version installed by conda:
conda list cudnn
If you want to install/update CUDA and CUDNN through CONDA, please use the following commands:
conda install -c anaconda cudatoolkit
conda install -c anaconda cudnn
Alternatively you can use following commands to check CUDA installation:
nvidia-smi
OR
nvcc --version
You could also run conda list from the anaconda command line:
conda list cudnn
# packages in environment at C:\Anaconda2:
#
# Name Version Build Channel
cudnn 6.0 0
Although not a public documented API, you can currently access it like this:
from tensorflow.python.platform import build_info as tf_build_info
print(tf_build_info.cuda_version_number)
# 9.0 in v1.10.0
print(tf_build_info.cudnn_version_number)
# 7 in v1.10.0
As of TensorFlow 2.4.1, We can use tensorflow.python.platform.build_info to get information on which CUDA, cuDNN the binary was built against.
>>> import tensorflow
>>> print(tensorflow.__version__)
'2.4.1'
>>> import tensorflow.python.platform.build_info as build
>>> print(build.build_info)
OrderedDict([('cpu_compiler', '/usr/bin/gcc-5'), ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '11.0'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False)])
The build.build_info is an OrderedDict. So to get CuDNN and CUDA versions:
>>> print(build.build_info['cuda_version'])
11.0
>>> print(build.build_info['cudnn_version'])
8
Note: As this is not a public API, things can change in future versions. In previous versions, we could do from tensorflow.python.platform import build_info as tf_build_info; print(tf_build_info.cuda_version_number) like in jdehesa's answer.

"import torch" giving error "from torch._C import *, DLL load failed: The specified module could not be found"

I am currently using Python 3.5.5 on Anaconda and I am unable to import torch. It is giving me the following error in Spyder:
Python 3.5.5 |Anaconda, Inc.| (default, Mar 12 2018, 17:44:09) [MSC v.1900
64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 6.2.1 -- An enhanced Interactive Python.
import torch
Traceback (most recent call last):
File "<ipython-input-1-eb42ca6e4af3>", line 1, in <module>
import torch
File "C:\Users\trish\Anaconda3\envs\virtual_platform\lib\site-
packages\torch\__init__.py", line 76, in <module>
from torch._C import *
ImportError: DLL load failed: The specified module could not be found.
Many suggestions on the internet say that the working directory should not be the same directory that the torch package is in, however I've manually set my working directory to C:/Users/trish/Downloads, and I am getting the same error.
Also I've already tried the following: reinstalling Anaconda and all packages from scratch, and I've ensured there is no duplicate "torch" folder in my directory.
Pls help! Thank you!
I had this similar problem in windows 10...
Solution:
Download win-64/intel-openmp-2018.0.0-8.tar.bz2 from https://anaconda.org/anaconda/intel-openmp/files
Extract it and put the dll files in Library\bin into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin
Make sure your cuda directory is added to your %PATH% environment variable
I had the same problem. In my case I didn't want the GPU version of pytorch.
I uninstalled it. The version was pytorch: 0.3.1-py36_cuda80_cudnn6he774522_2 peterjc123.
The problem is that cuda and cudnn . then installed with the following command and now it works!
conda install -c peterjc123 pytorch-cpu
I also encountered the same problem when I used a conda environment with python 3.6.8 and pytorch installed by conda from channel -c pytorch.
Here is what worked for me:
1:) conda create -n envName python=3.6 anaconda
2:) conda update -n envName conda
3:) conda activate envName
4:) conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
and then tested torch with the given code:
5:) python -c "import torch; print(torch.cuda.get_device_name(0))"
Note: 5th step will return your gpu name if you have a cuda compatible gpu
Summary: I just created a conda environment containing whole anaconda and then to tackle the issue of unmatched conda version I updated conda of new environment from the base environment and then installed pytorch in that environment and tested pytorch.
For CPU version, here is the link for my another answer: https://gist.github.com/peterjc123/6b804651288e76db7b5fabe5348e1f03#gistcomment-2842825
https://gist.github.com/peterjc123/6b804651288e76db7b5fabe5348e1f03#gistcomment-2842837
Had the same problem and fixed it by re-installing numpy with mkl (Intel's math kernel library)
https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Download the right .whl for your machine. For me it was numpy‑1.14.5+mkl‑cp36‑cp36m‑win_amd64.whl (python 3.6, windows, 64-bit)
and then install using pip.
pip install numpy‑1.14.5+mkl‑cp36‑cp36m‑win_amd64.whl
I am using a Windows 10 computer with an NVIDIA GeForce graphics card. NVIDIA showed I had CUDA 10.1, but I was getting this error when running import torch in Jupyter Lab and suspected it had something to do with CUDA support.
I fixed this problem by downloading and installing the CUDA Toolkit directly from NVIDIA. It installed all required Visual Studio components. When I returned to Jupyter Lab, import torch ran without error.
Make sure you installed the right version of pytorch for your enviroment. I had the same problem I was using pytorch on windows but I had the default package installed which was meant for cuda 8. So I reinstalled the pytorch package for cpu which was what I needed.
I had the same issue with running torch installed with pure pip and solved it by switching to conda.
Following steps:
uninstall python 3.6 from python.org (if exists)
install miniconda
install torch in conda ("conda install pytorch -c pytorch")
Issue with pip installation:
import torch
File "C:\Program Files\Python35\lib\site-packages\torch\__init__.py", line 78, in <module>
from torch._C import *
ImportError: DLL load failed: The specified module could not be found.
After switching to conda it works fine. I believe the issue was resolved by conda through installing the vs_redist 2017
vs2017_runtime 15.4.27004.2010 peterjc123
But I have tried it w/o conda and it did not help. Could not find how to check (and tweak) Python's vs_redist.
Windows10 Solution(This worked for my system):
I was having the same issue in my system. Previously I was using Python 3.5 and I created a virtual environment named pytorch_test using the virtualenv module because I didn't want to mess up my tensorflow installation(which took me a lot of time). I followed every instruction but it didn't seem to work. I installed python 3.6.7 added it to the path. Then I created the virtual environment using:
virtualenv --python=3.6 pytorch_test
Then go to the destination folder
cd D:\pytorch_test
and activate the virtual environment entering the command in cmd:
.\Scripts\activate
After you do this the command prompt will show:
(pytorch_test) D:\pytorch_test>
Update pip if you have not done it before using:
(pytorch_test) D:\pytorch_test>python -m pip install --upgrade pip
Then go for installing numpy+mkl from the site:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
Choose the correct version from the list if you have python 3.6.7 go with the wheel file:
numpy‑1.15.4+mkl‑cp36‑cp36m‑win_amd64.whl (For 64 bit)
(Note if the whole thing doesnot work just go with simple numpy installation and mkl installation separately)
Then go for installing openmp using:
(pytorch_test) D:\pytorch_test>pip install intel-openmp
Now you are done with the prerequisites. To install pytorch go to the previous versions site:
https://pytorch.org/get-started/previous-versions/
Here select the suitable version from the list of Windows Binaries. For example I am having CUDA 9.0 installed in my system with python 3.6.7 so I went with the gpu version:
cu90/torch-1.0.0-cp36-cp36m-win_amd64.whl
(There are two available versions 0.4.0 and 1.0.0 for pytorch, I went with 1.0.0)
After downloading the file install it using pip(assuming the whl file is in D:).You have to do this from the virtual environment pytorch_test itself:
(pytorch_test) D:\pytorch_test>pip install D:\torch-1.0.0-cp36-cp36m-win_amd64.whl
Prerequisites like six, pillow will be installed automatically.
Then once everything is done, install the models using torchvision.
Simply type :
(pytorch_test) D:\pytorch_test>pip install torchvision
To check everything is working fine try the following script:
import torch
test = torch.rand(4, 7)
print(test)
If everything was good then it wont be an issue. Whenever there is an issue like this it is related to version mismatch of one or more dependencies. This also occurred during tensorflow installation.
Deactivate the following virtual environment using the command deactivate in the cmd:
(pytorch_test) D:\pytorch_test>deactivate
This is the output of pip list in my system:
Package Version
------------ -----------
intel-openmp 2019.0
mkl 2019.0
numpy 1.16.2
Pillow 6.0.0
pip 19.0.3
setuptools 41.0.0
six 1.12.0
torch 1.0.0
torchvision 0.2.2.post3
wheel 0.33.1
Hope this helps. This is my first answer in this community, hope you all find it helpful. I setup pytorch today in the afternoon after trying all sorts of combinations. The same import problem occurred to me while installing CNTK and tensorflow. Anyway I kept them separate in different virtual environments so that I can use them anytime.

Categories

Resources