Tensorflow 2.7 GPU Memory not released

Tensorflow 2.7 GPU Memory not released - python

I am currently working on 1D Convolutional Neural Networks for Time Series Classification. Recently, i got CUDA working on my GeForce 3080 (which was a pain itself). However, i noticed a weird behavior when using tensorflow and cuda. After training a model, the gpu memory is not released, even after deleting the variables and doing garbage collection. I tried reseting the tf graph and closing the tf sessions, but the gpu memory stays allocated. This results in cross validation crashing and me having to restart my python environment every time i want to make changes and retrain my model.
After a tideous search, I found out people have been struggling with this 5 years ago. However, I am right now using tf 2.7. I am working on Ubuntu 20.04.3. Some of my colleagues are using windows and are not experiencing these problems. However, it seems like they do not have any issues with models not being able to be retrained because of already allocated memory.
I found the workaround using multiple processes, but wasn't able to get it to work for my model using 10 fold cv.
As the issue has been up for more than 5yrs now and my colleagues not having any problems, I was wondering if I am doing sth. wrong. I think that issue might very likely have been fixed after 5 years, which is why I think my code is the problem here.
Is there any solution / guide for tf 2.7 and memory allocation of the gpu?

Related

Utilizing hardware AI accelerators with PyTorch

I'm pretty new to StackOverflow, but also to using PyTorch. I'm an AI and CS major, and I'm working on a project involving processing video with ML models. I'm not going to get into the details because I want any answers to this question to be generally accessible to others using pytorch, but the issue is I'm using pytorch with vapoursynth at the moment, accelerating both with CUDA, but I'm looking into purchasing as AI accelerator like this:
Amazon
Documentation on using these with Tensorflow is pretty easy to find, but I'm having trouble trying to answer for myself how I can use one of these with PyTorch. Does anybody have experience with this? I'd simply like to be able to use this card to accelerate training a Neural Net.

It is correct that you would need to convert your code to run on XLA, but that includes only changing few lines in your code. Please refer to https://github.com/pytorch/xla README doc for references and guides. With few modifications you can get significant training speedup.

I think the experience of using Pytorch on TPU would be less smooth than it on nvidia GPU. As far as I know, you have to use XLA to convert pytorch models to make them able to run on TPU.

Tensorflow model can only achieve good results on one computer, fails everywhere else

I have a TensorFlow/Keras model that I am training on a synthetic classification task.
When training the model using my laptop, the model achieves 99.9% accuracy and loss values around 1e-8.
However, when I train the model on a different machine, the accuracy plateaus at 80% and the loss is stuck at 3e-1. I have reproduced the failure on my own server and Google Colab.
Now, since the issue appears to be that my laptop is configured differently, I am trying to find out what this difference is.
I have made sure that on both machines:
Python version is 3.7
Nvidia driver is 460.x.x
CUDA version is 11.2
Tensorflow version is 2.4 and is installed from pip
Numpy version is 1.19.5
Scipy version is 1.4.1
The laptop has an i7-7700HQ and a NVIDIA GeForce GTX 1050 Mobile.
The server has a Xeon Silver 4116 and several GPUs: TITAN Xp, TITAN V, GeForce RTX 2080 SUPER, TITAN V (I have tried all of them).
The problem happens both on CPU and GPU. Precision is set to float32 in all cases.
The code that is being run is exactly the same.
I cannot share the code, but I can say that it uses tf.math.segment_sum which is a non-deterministic op (I don't know if it may help).
I am at a complete loss here. I have tried looking at every possible discrepancy between the two configurations, but I could not find any. The fact that this issue happens also on CPU is what really blows my mind.
What could the problem be?
I hope this qualifies as a programming question since it's related to TensorFlow specifically. If not, I apologize in advance and will ask elsewhere.
Thanks

Is there any way to train numpy neural networks faster?

I implemented a Neural Network class using only python and numpy, and I want to do some experiments with it. The problem is that it takes so long to train. My computer does not have a high-end GPU nor a wonderful CPU, so I thought about some sort of 'cloud training'.
I know libraries such as TensorFlow or PyTorch use backends to train neural networks faster, and I was wondering if something similar could be achieved with numpy. Is there a way to run numpy in the cloud?
Even if it is slow and doesn't use GPUs would be fine for me. I tried to load my files to Google Colab, but it didn't work so well. It stopped running due to inactivity after some time.
Is there any nice solution out there?
Thanks for reading it all!

Try to use cupy instead of numpy, it runs on GPU (works well on colab GPU instance) and maybe you should do just some little modifications to your code.

Why only one of the GPU pair has a nonzero GPU utilization under a Tensorflow / keras job?

I have just started working on this quite big dataset with my new and shiny GTX 1080. I have installed the drivers as well as cuda, cuDNN etc.
Observation:
When I run nvidia-smi I get the following picture:
nvidia-smi
So, both of the GPUs are using loads of memory ( That is good, I think? ) however, the GPU utilization is very low for both of them, especially the second GPU.
Explanation: Why?
Any tips on how I can improve the performance as well of someone know why the GPU utilization is so low?

First of all, the nvidia-smi may be misleading.
Have a look at the discussion: nvidia-smi Volatile GPU-Utilization explanation?
Second: Unfortunately, Keras doesn't provide an out-of-the-box solution for using multiple GPU. Keras uses automatically (using TensorFlow as backend) one GPU. You have to use TensorFlow directly if you want to make use of both GPUs.
If you are using Keras version 1.x.x you can try this solution: Transparent Multi-GPU Training on TensorFlow with Keras.

Tensorflow execution in a virtual machine with no GPUs

I had a question regarding tensorflow that is, somewhat critical to what task I'm trying to accomplish.
My scenario is as follows,
1. I have a tensorflow script that has been set-up, trained and tested. It is working well.
The training and testing was done on a devBox with 2 Titan X cards.
We need to now port this system to a live-pilot testing stage and are required to deploy it on a virtual-machine with Ubuntu 14.04 running atop of it.
Here lies the problem - A vm will not have access to underlying GPUs and must validate the incoming data in CPU only mode. My question,
Will the absence of GPUs hinder the validation process of my ML system? Does tensorflow, by default use GPUs for CNN computation and will the absence of a GPU affect the execution?
How do I run my script in CPU only mode?
Will setting CUDA_VISIBLE_DEVICES to none help with the validation in a CPU-only mode after the system has been trained on GPU boxes?
I'm sorry if this comes across as a noob question but I am new to TF and any advice would be much appreciated. Please let me know if you need any further information about my scenario.

Testing with CUDA_VISIBLE_DEVICES set to empty string will make sure that you don't have anything that depends on GPU being present, and theoretically it should be enough. In practice, there are some bugs in GPU codepath which can get triggered when there are no GPUs (like this one), so you want to make sure your GPU software environment (CUDA version) is the same.
Alternatively, you could compile TensorFlow without GPU support (bazel build -c opt tensorflow), this way you don't have to worry about matching CUDA environments or setting CUDA_VISIBLE_DEVICES

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.