undefined symbol: cudnnCreate in ubuntu google cloud vm instance - python

I'm trying to run a tensorflow python script in a google cloud vm instance with GPU enabled. I have followed the process for installing GPU drivers, cuda, cudnn and tensorflow. However whenever I try to run my program (which runs fine in a super computing cluster) I keep getting:
undefined symbol: cudnnCreate
I have added the next to my ~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:/usr/local/cuda-8.0/lib64"
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="$PATH:/usr/local/cuda-8.0/bin"
but still it does not work and produces the same error

Answering my own question: The issue was not that the library was not installed, the library installed was the wrong version hence it could not find it. In this case it was cudnn 5.0. However even after installing the right version it still didn't work due to incompatibilities between versions of driver, CUDA and cudnn. I solved all this issues by re-installing everything including the driver taking into account tensorflow libraries requisites.

Related

Tensorflow 2.4.1 - Couldn't invoke ptxas.exe

I try to run Tensorflow with GPU support (GTX 1660 SUPER).
I created an enviroment using anaconda, than installed cudatoolkit (version 11.0.221) and tensorflow-gpu (version 2.4.1). Afterwards, I downloaded cuDNN (version 8.0.4), and copied all files from cuDNN's bin folder to my environment's bin folder at anaconda3\envs\<env name>\Library\bin.
In my script, I've set the memory limit to my GPU's memory using tf.config.experimental.set_memory_growth.
When I run the script (which uses convolutional algorithms), I get a warning that says Couldn't invoke ptxas.exe --version which comes after an Call to CreateProcess failed. Error code: 2 error.
After the launch failure, I get: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location.
I've already tried switching to cuDNN version 8.1.1.
How I fix this?
I got a new fix for this.
First I tried using tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1 as mentioned in previous answers. However, every time I put a model to train, the process was going stale and the training seemed to be stuck in epoch 1.
I then managed to include ptxas in my conda environment by running conda install -c nvidia cuda-nvcc The packages I am using are:
tensorflow=2.9, cudnn=8.1.0, cudatoolkit=11.2.2, cuda-nvcc=11.7.99 and python=3.9
I am running everything on windows 10 flawlessly now.
For the benefit of community adding #Zuk Levinson comment
Solves the issue by using
tensorflow=2.3, cudnn=7.6.5 and cudatoolkit=10.1

EXE made from Python file which uses Tensorflow-GPU does not use GPU when deployed

I have a python file which uses tensorflow GPU in it. It uses GPU when i run the file from console using python MyFile.py.
However, when i convert it into exe using pyinstaller, it converts and runs successfully, But it does not use GPU anymore when i run the exe. This happens on a system which was not used for developing MyFile.py. Checking on the same system which was used in development, it uses just 40-50% GPU, which was 90% if i run the python script.
My application even has a small UI made using tkinter.
Though application runs fine on CPU, It is incredibly slow. (I am not using --one-file flag in pyinstaller.) Although having GPU, The application is not using it.
My questions are:
How do I overcome this issue? Do I need to install any CUDA or CuDnn toolkits in my Destination computer?
(Once the main question is solved) Can i use 1050ti in development and 2080ti in destination computer, if the CuDnn and CUDA versions are the same?
Tensorflow Version : 1.14.0 (I know 2.x is out there, but this works perfectly fine for me.)
GPU : GeForce GTX 1050 ti ( In development as well as deployment.)
CUDA Toolkit : 10.0
CuDnn : v7.6.2 for cuda 10.0
pyinstaller version : 3.5
Python version : 3.6.5
As I asnwered also here, according to the GitHub issues in the official repository (here and here for example) CUDA libraries are usually dynamically loaded at run-time and not at link-time, so they are typically not included in the final exe file (or folder) with the result that the generated exe file won't work on a machine without CUDA installed. The solution (please refer to the linked issues too) is to put the DLLs necessary to run the exe in its dist folder (if generated without the --onefile option) or install the CUDA runtime on the target machine.

Using google cloud ml gpu on python 3.7

I am trying to run a ML model on google cloud ML. I am using pytorch and want to use the GPU. Using the standard Python3.6 version installed on the Google cloud VM, I get an error described below and tried solving it by upgrading the Python version to Python 3.7, but this version does not recognize the GPU that comes with the Google cloud VM.
Whenever I run my code (which works when ran locally) on the Google cloud VM (based on Python3.6) I get the error
python: symbol lookup error: /home/julsoles/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack
Trying to find a solution online, I figured out that this was an issue with the version of Python 3.6 and the only solution was to upgrade my version of Python.
I was able to upgrade my version of Python to Python3.7 in the Google Cloud VM and can run code with this new version using the command Python3.7 file.py.
Now, the issue is that whenever I run code using this version of Python, the VM does not recognize the GPU that comes with the system. I get the error
File "pred.py", line 75, in
predict(model_list, test_dataset) File "pred.py", line 28, in predict
x = Variable(torch.from_numpy(x).float()).cuda() File "/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py",
line 161, in _lazy_init
_check_driver() File "/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py",
line 75, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled
Right now, the only solution I have found is to run my code just using cpu, but it is painstakingly slow. Is there any way to make Python3.7 recognize the GPU so that I can run my code using the GPU?
Thanks for your help!

Is it possible to run tensorflow-gpu on a computer without a GPU or CUDA?

I have two Windows computers, one with and one without a GPU.
I would like to deploy the same python script on both (TensorFlow 1.8 Object Detection), without changing the packaged version of TensorFlow. In other words, I want to run tensorflow-gpu on a CPU.
In the event where my script cannot detect nvcuda.dll, I've tried using a Session config to disable the GPU, like so:
config = tf.ConfigProto(
device_count = {'GPU': 0}
)
and:
with tf.Session(graph=detection_graph, config=config) as sess:
However, this is not sufficient, as TensorFlow still returns the error:
ImportError: Could not find 'nvcuda.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable.
Typically it is installed in 'C:\Windows\System32'. If it is not present, ensure that you have a CUDA-capable GPU with the correct driver installed.
Is there any way to disable checking for a GPU/CUDA entirely and default to CPU?
EDIT: I have read the year-old answer regarding tensorflow-gpu==1.0 on Linux posted here, which suggests this is impossible. I'm interested to know if this is still how tensorflow-gpu is compiled, 9 versions later.

Compiling binary with tensorflow library for cpu: Cannot find cuda library?

In development, I have been using the gpu-accelerated tensorflow
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl
I am attempting to deploy my trained model along with an application binary for my users. I compile using PyInstaller (3.3.dev0+f0df2d2bb) on python 3.5.2 to create my application into a binary for my users.
For deployment, I install the cpu version, https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl
However, upon successful compilation, I run my program and receive the infamous tensorflow cuda error:
tensorflow.python.framework.errors_impl.NotFoundError:
tensorflow/contrib/util/tensorflow/contrib/cudnn_rnn/python/ops/_cudnn_rnn_ops.so:
cannot open shared object file: No such file or directory
why is it looking for cuda when I've only got the cpu version installed? (Let alone the fact that I'm still on my development machine with cuda, so it should find it anyway. I can use tensorflow-gpu/cuda fine in uncompiled scripts. But this is irrelevant because deployment machines won't have cuda)
My first thought was that somehow I'm importing the wrong tensorflow, but I've not only used pip uninstall tensorflow-gpu but then I also went to delete the tensorflow-gpu in /usr/local/lib/python3.5/dist-packages/
Any ideas what could be happening? Maybe I need to start using a virtual-env..

Categories

Resources