Importing TensorFlow stops python program from running - python

I have Python Tools setup in Visual Studios with CPython installed.
In Visual Studios, if i run the following code:
print("hello");
import numpy;
print("hello");
The program runs fine, prints two 'hello', and exits normally.
However, if I run the following code:
print("hello");
import tensorflow;
print("hello");
The program hangs, prints one 'hello', and refuses to continue.
All packages should be correctly installed - using the TensorFlow in the Python interactive window prints the correct output and works perfectly.
Why does the program hang in the second scenario?

Once you import tensorflow it automatically tries to load cuda, it prints something like this:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
So I think what is happening is that you don't have cuda installed correctly and it is failing because of it. You can try to install the CPU version which doesn't use the GPU and doesn't load those libraries.

Related

TensorFlow crashes with Failed to create cuSolverDN instance when tf.linalg.inv is called

I'm running the tensorflow/tensoflow:latest-gpu docker container. I can run simple vector operations like # for matrix multiplication without a problem. However, when I run the following minimal example:
import tensorflow as tf
tf.linalg.inv(tf.eye(10))
I get the following error:
2021-02-15 16:18:20.375254: I tensorflow/core/util/cuda_solvers.cc:180] Creating CudaSolver handles for stream 0x528cf90
2021-02-15 16:18:20.375365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library libcusolver.so.10
2021-02-15 16:18:21.854945: I tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library libcublas.so.11
2021-02-15 16:18:21.934489: F tensorflow/core/util/cuda_solvers.cc:120]
Check failed: cublasCreate(&cublas_handle) == CUBLAS_STATUS_SUCCESS Failed to create cuBlas instance.
The Python interpreter crashes. The CUDA and cuBLAS libraries are successfully opened, so I'm not sure what's causing the error.
The crash also happens with the tensorflow/tensorflow:devel-gpu image. When I try an earlier TensorFlow version (2.3), I do not get the error. However, I need to use >=2.4 because that's required by tensorflow_probability.
I'm on Pop! OS (Ubuntu 20.10), using a GTX 1650.
Edit: Installing tf-nightly natively on the host system doesn't produce the error; tf.linalg.inv(tf.eye(10)) works fine. This does not solve the problem with the docker image (nightly image still produces the error), but I have a working GPU tensorflow environment for now.
This was solved by allowing memory growth on the GPU:
gpu = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(device=gpu[0], enable=True)

EXE made from Python file which uses Tensorflow-GPU does not use GPU when deployed

I have a python file which uses tensorflow GPU in it. It uses GPU when i run the file from console using python MyFile.py.
However, when i convert it into exe using pyinstaller, it converts and runs successfully, But it does not use GPU anymore when i run the exe. This happens on a system which was not used for developing MyFile.py. Checking on the same system which was used in development, it uses just 40-50% GPU, which was 90% if i run the python script.
My application even has a small UI made using tkinter.
Though application runs fine on CPU, It is incredibly slow. (I am not using --one-file flag in pyinstaller.) Although having GPU, The application is not using it.
My questions are:
How do I overcome this issue? Do I need to install any CUDA or CuDnn toolkits in my Destination computer?
(Once the main question is solved) Can i use 1050ti in development and 2080ti in destination computer, if the CuDnn and CUDA versions are the same?
Tensorflow Version : 1.14.0 (I know 2.x is out there, but this works perfectly fine for me.)
GPU : GeForce GTX 1050 ti ( In development as well as deployment.)
CUDA Toolkit : 10.0
CuDnn : v7.6.2 for cuda 10.0
pyinstaller version : 3.5
Python version : 3.6.5
As I asnwered also here, according to the GitHub issues in the official repository (here and here for example) CUDA libraries are usually dynamically loaded at run-time and not at link-time, so they are typically not included in the final exe file (or folder) with the result that the generated exe file won't work on a machine without CUDA installed. The solution (please refer to the linked issues too) is to put the DLLs necessary to run the exe in its dist folder (if generated without the --onefile option) or install the CUDA runtime on the target machine.

Compiling binary with tensorflow library for cpu: Cannot find cuda library?

In development, I have been using the gpu-accelerated tensorflow
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl
I am attempting to deploy my trained model along with an application binary for my users. I compile using PyInstaller (3.3.dev0+f0df2d2bb) on python 3.5.2 to create my application into a binary for my users.
For deployment, I install the cpu version, https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl
However, upon successful compilation, I run my program and receive the infamous tensorflow cuda error:
tensorflow.python.framework.errors_impl.NotFoundError:
tensorflow/contrib/util/tensorflow/contrib/cudnn_rnn/python/ops/_cudnn_rnn_ops.so:
cannot open shared object file: No such file or directory
why is it looking for cuda when I've only got the cpu version installed? (Let alone the fact that I'm still on my development machine with cuda, so it should find it anyway. I can use tensorflow-gpu/cuda fine in uncompiled scripts. But this is irrelevant because deployment machines won't have cuda)
My first thought was that somehow I'm importing the wrong tensorflow, but I've not only used pip uninstall tensorflow-gpu but then I also went to delete the tensorflow-gpu in /usr/local/lib/python3.5/dist-packages/
Any ideas what could be happening? Maybe I need to start using a virtual-env..

undefined symbol: cudnnCreate in ubuntu google cloud vm instance

I'm trying to run a tensorflow python script in a google cloud vm instance with GPU enabled. I have followed the process for installing GPU drivers, cuda, cudnn and tensorflow. However whenever I try to run my program (which runs fine in a super computing cluster) I keep getting:
undefined symbol: cudnnCreate
I have added the next to my ~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:/usr/local/cuda-8.0/lib64"
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="$PATH:/usr/local/cuda-8.0/bin"
but still it does not work and produces the same error
Answering my own question: The issue was not that the library was not installed, the library installed was the wrong version hence it could not find it. In this case it was cudnn 5.0. However even after installing the right version it still didn't work due to incompatibilities between versions of driver, CUDA and cudnn. I solved all this issues by re-installing everything including the driver taking into account tensorflow libraries requisites.

Inception v3 guide on tensorflow broken for C++ and python

I'm following the guide here on running the pretrained inception v3 https://www.tensorflow.org/versions/r0.11/tutorials/image_recognition/index.html
However, when I try the python version, I get:
python classify_image.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "classify_image.py", line 227, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
TypeError: run() got an unexpected keyword argument 'argv'
Ok.. Fine nevermind let me try the C++ Version.
Downloaded the model, run the bazel command:
➜ tensorflow git:(master) ✗ bazel build tensorflow/examples/label_image/...
.......
ERROR: /storage/git/tensorflow/tensorflow/tensorflow.bzl:636:21: syntax error at '=': expected expression.
ERROR: /storage/git/tensorflow/tensorflow/tensorflow.bzl:711:1: nested functions are not allowed. Move the function to top-level.
ERROR: /storage/git/tensorflow/tensorflow/tensorflow.bzl:739:1: nested functions are not allowed. Move the function to top-level.
ERROR: /storage/git/tensorflow/tensorflow/tensorflow.bzl:773:1: nested functions are not allowed. Move the function to top-level.
ERROR: /storage/git/tensorflow/tensorflow/tensorflow.bzl:776:1: nested functions are not allowed. Move the function to top-level.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension 'tensorflow/tensorflow.bzl' has errors.
INFO: Elapsed time: 0.600s
...Okay then. Neither seems to work. Or perhaps I'm doing this wrong. Anyone has any guidance? :)
Using tensorflow 0.11 on Ubuntu 16, Anaconda distribution python 3.5
Thanks!
If it helps anyone:
Solving the C++ problem: Update Bazel to the correct version (you likely installed tensorflow ages ago and git pulled the latest which requires a new bazel version)
Solving the python problem: Remove the argv command.

Categories

Resources