GPU memory nearly full after defining tf.distribute.MirroredStrategy? [duplicate] - python

This question already has answers here:
How to prevent tensorflow from allocating the totality of a GPU memory?
(16 answers)
Closed 8 months ago.
I am coming across a strange issue when using TensorFlow (2.9.1). After defining a distributed training strategy, my GPU memory appears to fill.
Steps to reproduce are simple:
import tensorflow as tf
strat = tf.distribute.MirroredStrategy()
After the first line (importing TensorFlow), nvidia-smi outputs:
Fri Jun 10 03:01:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 25C P8 9W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 20C P8 7W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
After the second line of code, nvidia-smi outputs:
Fri Jun 10 03:02:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 29C P0 59W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 25C P0 58W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1833720 C python 23949MiB |
| 1 N/A N/A 1833720 C python 23949MiB |
+-----------------------------------------------------------------------------+
The GPU memory is almost entirely full? There is also some terminal output:
2022-06-10 03:02:37.442336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 03:02:39.136390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23678 MB memory: -> device: 0, name: Quadro P6000, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-06-10 03:02:39.139204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 23678 MB memory: -> device: 1, name: Quadro P6000, pci bus id: 0000:06:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
Any ideas on why this is occurring would be helpful! Additional details about my configuration:
Python 3.10.4 [GCC 7.5.0] on linux
tensorflow 2.9.1
cuda/11.2.2 cudnn/v8.2.1

By default, Tensorflow will map almost all of your GPU memory: official guide. This is for performance reasons: by allocating the GPU memory, it reduces latency that memory growth would typically cause.
You can try using tf.config.experimental.set_memory_growth to prevent it from immediately filling up all its memory. There are also some good explanations on this StackOverflow post.

Related

How to reset memory usage of GPU when training model on server

I am training an LSTM model, but the time taking by one epoch is too high. And if I check the memory usage by using nvidia-smi, I get the following where all the available memory is assigned.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:02:00.0 Off | 0 |
| N/A 38C P0 58W / 250W | 15967MiB / 16280MiB | 71% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:82:00.0 Off | 0 |
| N/A 30C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 10871 C ...tify/aidentify/bin/python 15965MiB |
+-----------------------------------------------------------------------------+
I tried to kill the process with PID ID, but then kernel restarts and if I start training the model again, it uses all the memory and the training is slow again.
Warning I get when I use "tf.keras.utils.timeseries_dataset_from_array"
I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15389 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0
Is it possible to reset the memory usage or is it possible to run on other GPU1 which is available. Kindly help to resolve this issue.

Nvidia imaginaire: CUDA out of memory

I am using Nvidia imaginaire for a University project and have the problem, that I always get the error:
"RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 3.95 GiB total capacity; 1.54 GiB already allocated; 189.50 MiB free; 2.40 GiB reserved in total by PyTorch)"
I am a little bit lost what else I can do to free space than decreasing the batch_size and even decreasing my resolution.
Here are some infos of my GPU and CUDA:
-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 35% 32C P8 17W / 170W | 355MiB / 4039MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I am using this config .yaml file: https://github.com/NVlabs/imaginaire/blob/master/configs/projects/fs_vid2vid/face_forensics/ampO1.yaml
Could it be that 4GB of VRAM are not enough? Can I somehow prevent the system to reserve 2.40 GiB for PyTorch?
Thank you in advance and best wishes!

How to automatically select idle GPU for model traning in tensorflow?

I am using nvidia prebuilt docker container NVIDIA Release 20.12-tf2 to run my experiment. I am using TensorFlow Version 2.3.1. Currently, I am running my model on one of GPU, I still have 3 more idle GPUs so I intend to use my alternative experiment on any idle GPUs. Here is the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:6A:00.0 Off | 0 |
| N/A 70C P0 71W / 70W | 14586MiB / 15109MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:6B:00.0 Off | 0 |
| N/A 39C P0 27W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 Off | 00000000:6C:00.0 Off | 0 |
| N/A 41C P0 28W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 Off | 00000000:6D:00.0 Off | 0 |
| N/A 41C P0 28W / 70W | 212MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
update: prebuilt -container:
I'm using nvidia-prebuilt container as follow:
docker run -ti --rm --gpus all --shm-size=1024m -v /home/hamilton/data:/data nvcr.io/nvidia/tensorflow:20.12-tf2-py3
To utilize idle GPU for my other experiments, I tried to add those in my python script:
attempt-1
import tensorflow as tf
devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(devices[0], True)
but this attempt gave me following error:
raise ValueError("Memory growth cannot differ between GPU devices") ValueError: Memory growth cannot differ between GPU devices
I googled this error but none of them discussed on GitHub is not working for me.
attempt-2
I also tried this:
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
but this attempt also gave me error like this:
Error occurred when finalizing GeneratorDataset iterator: Failed
precondition: Python interpreter state is not initialized. The process
may be terminated.
people discussed this error on github but still not able to get rid of error on my side.
latest attempt:
I also tried parallel training with TensorFlow and added those to my python script:
device_type = "GPU"
devices = tf.config.experimental.list_physical_devices(device_type)
devices_names = [d.name.split("e:")[1] for d in devices]
strategy = tf.distribute.MirroredStrategy(devices=devices_names[:3])
with strategy.scope():
opt = Adam(learning_rate=0.1)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
but this gave me also error and the program stopped.
Can anyone help me how to automatically select idle GPUs for the training model in tensorflow? Does anyone know any workable approach? What's wrong with my above attempt? Any possible ideas to utilize idle GPUs while running the program on one of the GPUs? any thoughts?
Thanks to #HernánAlarcón suggestion, I tried like this and it worked like charm:
docker run -ti --rm --gpus device=1,3 --shm-size=1024m -v /home/hamilton/data:/data nvcr.io/nvidia/tensorflow:20.12-tf2-py3
this may not be an elegant solution but it worked like charm. I am open to other possible remedies to fix this sort of problem.

Understanding GPU processes from `nvidia-smi` command

I have a new laptop with NVidia RTX 2070 GPU that I'm using to train tensorflow 2.1 models. Unfortunately I'm having GPU OOM (out of memory) issues - in the middle of training it crashes, I reduced RAM usage a lot, but of course the problem still persists.
I tried to check what is causing the GPU go OOM. When I type nvidia-smi in the terminal I get the following outputs:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 On | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 6W / N/A | 1009MiB / 7982MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1209 G /usr/lib/xorg/Xorg 72MiB |
| 0 1237 G /usr/bin/gnome-shell 52MiB |
| 0 1565 G /usr/lib/xorg/Xorg 481MiB |
| 0 1693 G /usr/bin/gnome-shell 195MiB |
| 0 12312 G ...uest-channel-token=14048285025818334832 204MiB |
+-----------------------------------------------------------------------------+
What are these processes about? Can you help me interpret and understand them? Is there anything I can kill, and how?
All of the processes are X-windows display processes. If your laptop has some sleazy onboard video (many do, for power-saving) you can configure it to use that to drive the display, and that will free up the fancy card for computing.

RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file gpu_data.cpp:201. code: 30, reason: unknown error

I am getting following error while initializing the face detector model dlib.cnn_face_detection_model_v1(path).
RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file gpu_data.cpp:201. code: 30, reason: unknown error
This get away with this error by simply restarting the machine. Since, this can't be done if it occurs frequently in future, so I would like to understand the cause of this error. With little research I found this is somewhat related to driver version, but what is exactly wrong with the initialization is still a question for me.
Below mentioned information may be useful:
>>> nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Tue Feb 18 15:15:45 2020
Driver Version : 410.129
CUDA Version : 10.0
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1070 with Max-Q Design
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
>>> nvidia-smi
Tue Feb 18 15:21:48 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... On | 00000000:01:00.0 On | N/A |
| N/A 48C P0 31W / N/A | 483MiB / 8117MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3465 G /usr/lib/xorg/Xorg 18MiB |
| 0 3597 G /usr/bin/gnome-shell 48MiB |
| 0 6012 G /usr/lib/xorg/Xorg 213MiB |
| 0 6148 G /usr/bin/gnome-shell 199MiB |
+-----------------------------------------------------------------------------+
I am looking for some explanation for this error. Solution for it is more than welcome.

Categories

Resources