I've created a conda environment and installed tensorflow as such:
conda create -n foo python=3.10
conda activate foo
conda install mamba
mamba install tensorflow -c conda-forge
mamba install cudnn cudatoolkit
This installed TensorFlow 2.10.0. I've installed CUDA 11.2 and cuDNN 8.1, and then try to run the following:
import tensorflow as tf
print(f"GPUs available: {tf.config.list_physical_devices('GPU')}")
but it just returns an empty list. I have a 3060ti that I want to use for my ML projects but TensorFlow is not detecting it. I found similar questions to mine, like this, this and this but they use the old version of TensorFlow, which would install tensorflow-gpu and is no longer supported. How can I fix this, or even attempt to troubleshoot it.
I'm using a Windows 10 machine
Output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.24 Driver Version: 528.24 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:09:00.0 On | N/A |
| 30% 43C P8 16W / 200W | 809MiB / 8192MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 7176 C+G ...perience\NVIDIA Share.exe N/A |
| 0 N/A N/A 9240 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 12936 C+G ...cw5n1h2txyewy\LockApp.exe N/A |
| 0 N/A N/A 13652 C+G ...e\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 14020 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 14888 C+G ...ser\Application\brave.exe N/A |
| 0 N/A N/A 15112 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 16516 C+G ...oft OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 18296 C+G ...aming\Spotify\Spotify.exe N/A |
| 0 N/A N/A 18624 C+G ...in7x64\steamwebhelper.exe N/A |
| 0 N/A N/A 18672 C+G ...\app-1.0.9010\Discord.exe N/A |
| 0 N/A N/A 18828 C+G ...lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 19284 C+G ...Central\Razer Central.exe N/A |
| 0 N/A N/A 20020 C+G ...arp.BrowserSubprocess.exe N/A |
| 0 N/A N/A 22912 C+G ...8wekyb3d8bbwe\Cortana.exe N/A |
| 0 N/A N/A 24848 C+G ...ontend\Docker Desktop.exe N/A |
| 0 N/A N/A 25804 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 27064 C+G ...8bbwe\WindowsTerminal.exe N/A |
+-----------------------------------------------------------------------------+
Output of nvcc -V:
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
I ran a dummy code as such:
import tensorflow as tf
import numpy as np
def make_nn():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, input_shape=(1,)))
model.compile(loss='mean_squared_error', optimizer='sgd')
return model
def dataset():
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
return tf.data.Dataset.from_tensor_slices((x, y)).batch(1)
def main():
model = make_nn()
model.fit(dataset(), epochs=1, steps_per_epoch=9)
if __name__ == '__main__':
print(f"GPUs available: {tf.config.list_physical_devices('GPU')}")
print(f"Built with cuda: {tf.test.is_built_with_cuda()}")
main()
and it gave me the following log:
GPUs available: []
Built with cuda: False
2023-02-06 09:47:32.744450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-06 09:47:32.779280: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Looks like it's using a CPU build
Probably not the best solution, but I downgraded TensorFlow back to version 2.6.0 which was previously installed and it worked, which is a bummer, I wanted to try some more recent features, but for the time being looks like this will suffice. If anyone is facing the same issues, this is the current conda environment that I'm using
If you use conda-forge you may need to set the environment variable
CONDA_OVERRIDE_CUDA to force installing the gpu enabled version of tensorflow as explained here https://conda-forge.org/docs/user/tipsandtricks.html#installing-cuda-enabled-packages-like-tensorflow-and-pytorch. Under bash that would be something like
CONDA_OVERRIDE_CUDA="11.2" conda install "tensorflow==2.8" -c conda-forge
Related
I am training an LSTM model, but the time taking by one epoch is too high. And if I check the memory usage by using nvidia-smi, I get the following where all the available memory is assigned.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:02:00.0 Off | 0 |
| N/A 38C P0 58W / 250W | 15967MiB / 16280MiB | 71% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:82:00.0 Off | 0 |
| N/A 30C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 10871 C ...tify/aidentify/bin/python 15965MiB |
+-----------------------------------------------------------------------------+
I tried to kill the process with PID ID, but then kernel restarts and if I start training the model again, it uses all the memory and the training is slow again.
Warning I get when I use "tf.keras.utils.timeseries_dataset_from_array"
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15389 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0
Is it possible to reset the memory usage or is it possible to run on other GPU1 which is available. Kindly help to resolve this issue.
My PC
Microsoft Windows [Version 10.0.22621.963]
(c) Microsoft Corporation. All rights reserved.
C:\Users\donhu>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
C:\Users\donhu>nvidia-smi
Sat Dec 17 23:40:44 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 On | N/A |
| 34% 31C P8 16W / 125W | 1377MiB / 6144MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3392 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 4484 C+G ...artMenuExperienceHost.exe N/A |
| 0 N/A N/A 6424 C+G ...n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 6796 C+G ...lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 7612 C+G ...8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 9700 C+G ...8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 10624 C+G ...perience\NVIDIA Share.exe N/A |
| 0 N/A N/A 10728 C+G ...er Java\jre\bin\javaw.exe N/A |
| 0 N/A N/A 13064 C+G ...8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 14496 C+G ...462.46\msedgewebview2.exe N/A |
| 0 N/A N/A 17124 C+G ...ooting 2\BugShooting2.exe N/A |
| 0 N/A N/A 19064 C+G ...8bbwe\Notepad\Notepad.exe N/A |
| 0 N/A N/A 19352 C+G ...8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 20920 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 21320 C+G ...e\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 21368 C+G ...me\Application\chrome.exe N/A |
+-----------------------------------------------------------------------------+
C:\Users\donhu>
I train model
from tensorflow import keras
from tensorflow.keras import layers
def get_model():
model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
model = get_model()
history_noise = model.fit(
train_images_with_noise_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
model = get_model()
history_zeros = model.fit(
train_images_with_zeros_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)
source code https://github.com/donhuvy/deep-learning-with-python-notebooks/blob/master/chapter05_fundamentals-of-ml.ipynb
How to use GPU with TensorFlow?
Using GPU should be automatical for the Tensorflow, it seems that you are missing some of the required components (citing the Tensorflow web page):
The following NVIDIA® software are only required for GPU support.
NVIDIA® GPU drivers version 450.80.02 or higher.
CUDA® Toolkit 11.2.
cuDNN SDK 8.1.0.
(Optional) TensorRT to improve latency and throughput for inference.
See their complete list here:
https://www.tensorflow.org/install/pip#software_requirements
After installing all of these, the Tensorflow should work fine and display that it found capable GPU device. Also note, when downloading the packages, you need mutually matching versions of CUDA and cuDNN.
If use virtual env
CMD run as Administrator (important: run with Administrator role)
cd /d D:\temp20221103\
py --list
py -3.10 -m venv vy310
vy310\Scripts\activate
py -V
jupyter lab
!pip install tensorflow
!pip install cuda-python
!pip install nvidia-pyindex
!pip install nvidia-cudnn
!pip install tensorflow-gpu
import tensorflow as tf
tf.sysconfig.get_build_info()
tf.sysconfig.get_build_info()["cuda_version"]
result
OrderedDict([('cpu_compiler',
'C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/HostX64/x64/cl.exe'),
('cuda_compute_capabilities',
['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']),
('cuda_version', '64_112'),
('cudart_dll_name', 'cudart64_112.dll'),
('cudnn_dll_name', 'cudnn64_8.dll'),
('cudnn_version', '64_8'),
('is_cuda_build', True),
('is_rocm_build', False),
('is_tensorrt_build', False),
('msvcp_dll_names', 'msvcp140.dll,msvcp140_1.dll'),
('nvcuda_dll_name', 'nvcuda.dll')])
As screenshot, you are using Anaconda. Need install
cudatoolkit
cudnn
then
tf.debugging.set_log_device_placement(True)
import tensorflow as tf
tf.debugging.set_log_device_placement(False)
# mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/cpu:0", "/gpu:1"])
mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0"])
result
This question already has answers here:
How to prevent tensorflow from allocating the totality of a GPU memory?
(16 answers)
Closed 8 months ago.
I am coming across a strange issue when using TensorFlow (2.9.1). After defining a distributed training strategy, my GPU memory appears to fill.
Steps to reproduce are simple:
import tensorflow as tf
strat = tf.distribute.MirroredStrategy()
After the first line (importing TensorFlow), nvidia-smi outputs:
Fri Jun 10 03:01:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 25C P8 9W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 20C P8 7W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
After the second line of code, nvidia-smi outputs:
Fri Jun 10 03:02:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 29C P0 59W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 25C P0 58W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1833720 C python 23949MiB |
| 1 N/A N/A 1833720 C python 23949MiB |
+-----------------------------------------------------------------------------+
The GPU memory is almost entirely full? There is also some terminal output:
2022-06-10 03:02:37.442336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 03:02:39.136390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23678 MB memory: -> device: 0, name: Quadro P6000, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-06-10 03:02:39.139204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 23678 MB memory: -> device: 1, name: Quadro P6000, pci bus id: 0000:06:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
Any ideas on why this is occurring would be helpful! Additional details about my configuration:
Python 3.10.4 [GCC 7.5.0] on linux
tensorflow 2.9.1
cuda/11.2.2 cudnn/v8.2.1
By default, Tensorflow will map almost all of your GPU memory: official guide. This is for performance reasons: by allocating the GPU memory, it reduces latency that memory growth would typically cause.
You can try using tf.config.experimental.set_memory_growth to prevent it from immediately filling up all its memory. There are also some good explanations on this StackOverflow post.
I am using nvidia prebuilt docker container NVIDIA Release 20.12-tf2 to run my experiment. I am using TensorFlow Version 2.3.1. Currently, I am running my model on one of GPU, I still have 3 more idle GPUs so I intend to use my alternative experiment on any idle GPUs. Here is the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:6A:00.0 Off | 0 |
| N/A 70C P0 71W / 70W | 14586MiB / 15109MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:6B:00.0 Off | 0 |
| N/A 39C P0 27W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 Off | 00000000:6C:00.0 Off | 0 |
| N/A 41C P0 28W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 Off | 00000000:6D:00.0 Off | 0 |
| N/A 41C P0 28W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
update: prebuilt -container:
I'm using nvidia-prebuilt container as follow:
docker run -ti --rm --gpus all --shm-size=1024m -v /home/hamilton/data:/data nvcr.io/nvidia/tensorflow:20.12-tf2-py3
To utilize idle GPU for my other experiments, I tried to add those in my python script:
attempt-1
import tensorflow as tf
devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(devices[0], True)
but this attempt gave me following error:
raise ValueError("Memory growth cannot differ between GPU devices") ValueError: Memory growth cannot differ between GPU devices
I googled this error but none of them discussed on GitHub is not working for me.
attempt-2
I also tried this:
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
but this attempt also gave me error like this:
Error occurred when finalizing GeneratorDataset iterator: Failed
precondition: Python interpreter state is not initialized. The process
may be terminated.
people discussed this error on github but still not able to get rid of error on my side.
latest attempt:
I also tried parallel training with TensorFlow and added those to my python script:
device_type = "GPU"
devices = tf.config.experimental.list_physical_devices(device_type)
devices_names = [d.name.split("e:")[1] for d in devices]
strategy = tf.distribute.MirroredStrategy(devices=devices_names[:3])
with strategy.scope():
opt = Adam(learning_rate=0.1)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
but this gave me also error and the program stopped.
Can anyone help me how to automatically select idle GPUs for the training model in tensorflow? Does anyone know any workable approach? What's wrong with my above attempt? Any possible ideas to utilize idle GPUs while running the program on one of the GPUs? any thoughts?
Thanks to #HernánAlarcón suggestion, I tried like this and it worked like charm:
docker run -ti --rm --gpus device=1,3 --shm-size=1024m -v /home/hamilton/data:/data nvcr.io/nvidia/tensorflow:20.12-tf2-py3
this may not be an elegant solution but it worked like charm. I am open to other possible remedies to fix this sort of problem.
I have a new laptop with NVidia RTX 2070 GPU that I'm using to train tensorflow 2.1 models. Unfortunately I'm having GPU OOM (out of memory) issues - in the middle of training it crashes, I reduced RAM usage a lot, but of course the problem still persists.
I tried to check what is causing the GPU go OOM. When I type nvidia-smi in the terminal I get the following outputs:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 On | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 6W / N/A | 1009MiB / 7982MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1209 G /usr/lib/xorg/Xorg 72MiB |
| 0 1237 G /usr/bin/gnome-shell 52MiB |
| 0 1565 G /usr/lib/xorg/Xorg 481MiB |
| 0 1693 G /usr/bin/gnome-shell 195MiB |
| 0 12312 G ...uest-channel-token=14048285025818334832 204MiB |
+-----------------------------------------------------------------------------+
What are these processes about? Can you help me interpret and understand them? Is there anything I can kill, and how?
All of the processes are X-windows display processes. If your laptop has some sleazy onboard video (many do, for power-saving) you can configure it to use that to drive the display, and that will free up the fancy card for computing.