I have recently ran:
apt-get update
apt-get upgrade
on Ubuntu 18.04. I noticed that it upgraded some nvidia related packages.
After the upgrade tensorflow has slowed down extremely. Before the upgrade training a test network took 75 seconds and now that takes about 15 minutes.
My versions:
cuda 10.0
nvidia driver 415.27
Cuda compilation tools release 9.1, V9.1.85
In tensorflow conda env:
cudatoolkit 9.2
cudnn 7.2.1
python 3.6.8
tensorflow/tensorflow-base/tensorflow-gpu 1.12.0
I have tried many things to fix this including new conda environment just for tensorflow, other gpu drivers (390, 410), re-installing gpu drivers.
I don't know how to find the root of the problem. I am using a gtx 1080ti. Is there some kind of benchmark I can run?
I tried to run the tensorflow cnn benchmark but that requires tf_nightly_gpu which doesn't support cuda 10.0 yet.
Related
I'm trying to use tensorflow with my PC's GPU (Nvidia RTX 3070Ti) in python-conda environment. I'm solving a small image-classification problem from kaggle. I've solved it in google-collab, but now I'm intrested in solving it on my local machine. However TF doesn't work properly locally and I have no idea why. I've read tons of solutions but it didn't help yet.
I'm following this guide and always install proper versions of TF and CUDA: https://www.tensorflow.org/install/source_windows
cuda-toolkit 10.1, cudnn 7.6, tf-gpu 2.3, python 3.8
Also I've installed latest NVidia drivers for videocard.
What I've tried:
I've installed proper version CUDA-toolkit and CUDnn from nvidia site. I've installed it properly and included everything that was needed into PATH. I've checked it - MS Visiual Studio finds both CUDA and CUDnn and can work with it. I've installed proper version of Tensorflow-GPU using conda into my environment.
Result: TF can't find my GPU and uses only CPU.
I've removed all CUDA and CUDAnn drivers. I've installed CUDA-toolkit, CUDnn and Tensorflow-GPU python packages into my conda environment.
Result: TF recognizes my GPU and uses it! But during DNN training happens error: Failed to launch ptxas Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. And training goes very bad - accuracy is very low and doesn't improving.
When I use absolutely same code and data on google-collab, everything is going smoothly - I get ~90% accuracy on 5th epoch.
I've tried tf 2.1 and relevant cuda and cudnn, but it's still same result!
I've tried to install cudatoolkit-dev, but it didn't help to solve ptxas problem.
I'm about to give up and use PyTorch instead of Tensorflow.
So here is what worked for me:
Create 3.9 python environment
Install cuda and tensorflow packages from "Esri":
conda install -c esri cudatoolkit
conda install -c esri cudnn
conda install -c esri tensorflow-gpu
Then install tensorflow-hub:
conda install -c conda-forge tensorflow-hub
It will downgrade installations from previous steps, but it works. Maybe installing tensorflow-hub first could help to avoid it, but I didn't test it.
Tensorflow on gpu new to me, first naive question is, am I correct in assuming that I can use a gpu (nv gtx 1660ti) to run tensorflow ml operations, while it simultaneously runs my monitor? Only have one gpu card in my pc, assume it can do both at the same time or do I require a dedicated gpu for tensorflow only, that is not connected to any monitor?
All on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook. All seems to run fine until I attempt to train a model and then I get this error message:
2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302
and then the kernel just locks up and crashes. BTW the code worked prior to installing gpu, when it simply used cpu. Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister? Thx. J.
am I correct in assuming that I can use a GPU (nv gtx 1660ti) to run
tensorflow ml operations, while it simultaneously runs my monitor?
Yes, you can check with nvidia-smi on ubuntu to see how much free memory you have or which processes are using GPU.
Only have one GPU card in my pc, assume it can do both at the same?
time
Yes, It can. Most people do the same, a training process on GPU is just similar to running a game, (but more memory hungry)
About the problem:
install based on this version table.
check your driver version with nvidia-smi But, for true Cuda version check this nvcc -V ( the Cuda version in nvidia-smi is actually max supported Cuda version. )
just install pip install tensorflow-gpu this will also install keras for you.
check if tensorflow has access to GPU as follow:
import tensorflow as tf
tf.test.is_gpu_available() #should return True
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
install based on this version table.
That was the key for me. Had the same issue , CPU worked fine, GPU would dump out during model fit with an exit code but no error. The matrix will show you that tensorflow 2.5 - 2.8 work with CUDA 11.2 and cudnn 8.1 , the 'latest' versions are 11.5 and 8.4 as of 05/2022. I rolled back both versions and everything is working fine.
The matrix will show you that tensorflow 2.5 - 2.8 work with CUDA 11.2 and cudnn 8.1
I believe the problem is that CUDA 11.2 is not available for Windows 11.
I want to run the project using Anaconda, TensorFlow 2.3, Keras 2.4.3 (CNN example). OS Windows 10.
I installed Visual Studio 2019 Community Edition, CUDA 10.1 and cudnn 8.0.5 for CUDA 10.1.
Using Anaconda I created an environment with TensorFlow (tensorflow-gpu didn't help), Keras, matplotlib, scikit-learn. I tried to run it on CPU but it takes a lot of time (20 minutes for just 1 epoch when there are 35).
I need to run it using GPU, but TensorFlow doesn't see my GPU device (GeForce GTX 1060). Can someone help me find the problem? I tried to solve the problem using this guide tensorflow but it didn't help me.
This works 100%, no need to install anything manually (cuda for example)
conda create --name tf_gpu tensorflow-gpu
Ok so I tried to install all the components into new anaconda environment. But instead of "conda install tensorflow-gpu" I decided to write "pip install tensorflow-gpu" and now it works via GPU...
Just a heads up, the Cudnn version you were trying to use was incompatible.
Listing Versions and compatible CUDA+Cudnn
You can go here and then scroll down to the bottom to see what versions of CUDA and Cudnn were used to build TensorFlow.
I am trying to install Tensorflow 2.2 (or later) in Windows 10. According to the official Tensorflow instalation guide, Python 3.8 support requires TensorFlow 2.2 or later. I installed Anaconda with python 3.8 and then tried to install tensorflow using conda install -c anaconda tensorflow but it displays 2 errors:
My Python version is not compatible (although the tensorflow page says the contrary).
My CUDA version is 11.0 (but I installed the 10.1 version as specified in the tensorflow installation guide).
In this picture I show the message errors
Additionally I tried using only conda install tensorflow but it displays the same messages as before.
I also tried doing the installation both inside and outside the enviroment I created named sstensorflow but it doesn't work.
Regarding the second error message, I used nvcc --version to check the installed version of the CUDA driver and it says it is version 10.1 as shown in this picture.
So I don't know why my computer admits having CUDA 10.1 but when trying to install tensorflow it says I have CUDA 11.0 and also I don't know what is the error regarging my python version. Please help me.
I had a similar problem. Had to go back to python 3.7. Other issue is that when it says
Your installed version is 11.0 I believe it is referring to your GPU card driver not the CUDA version. I had to find a driver version compatible with CUDA 10.1. I have an RTX 2070 GPU and the driver version I have is 26.21.14.3200. GO to the Nvidia site and search for a driver for your GPU card that is compatble with CUDA 10.1
How do I install TensorFlow 2.2 with Nvidia Geforce GTX 1650 with Anoconda (on Windows 10)
I want to know whether this anaconda command will work instead of manually installing all the required files like CUDA toolkit, CUdnn and TensorRT (for TensorFlow version 2.2 GPU).
$conda create Test Tensorflow-gpu==2.1
$conda activate Test
$pip3 install Tensorflow-gpu==2.2
Note: I will manually download the GPU driver as recommended on tensorflow official website!
I do pip3 install tensorflow-gpu (for version 2.2) because as per TensorFlows Official website both 2.1 & 2.2 use the same CUDA & cuDNN version.
Right now it looks like Anaconda's highest version of Tensorflow is 2.1. If you want 2.2 you'll need to install tensorflow gpu, cuda, and cudnn manually.