When I typically run a python script from command line, for example, python test.py, the GPU memory will be released just after the script finished.
In this test.py script, I simply loaded a keras built model to evaluate and predict some data. No training process in it.
However, if I open my 'spyder', and run this script in 'spyder', the results come in the 'ipython' section, but then I type nvidia-smi from command line, the GPU memory is not released.
So, what I tried is close this 'ipython' kernel and start a new one. But all my other variables will be lost. Is there a decent way to release the GPU memory after model.evaluate(x, y) from 'spyder'?
Here are some screen shots:
Before and after running the script from 'spyder':
Normally, tensorflow backend will reserve all the memory on the GPU. It may not really use all of the memory, but it will be kept occupied from being used by other programs until tensorflow backend is terminated. So in nvidia-smi you will see the memory is not release even tensorflow has released the previous memory in its framework.
Related
Generally for running pip with no cache we use --no-cache-dir, like
pip install pytorch --no-cache-dir.
I downloaded a CNN model I want to use from github.
The first two lines of execution
python generate_dataset.py --is_train=True --use_phase=True --chip_size=100 --patch_size=94 --use_phase=True --dataset=soc
python generate_dataset.py --is_train=False --use_phase=True --chip_size=128 --patch_size=128 --use_phase=True --dataset=soc
executed succesfully. But while running
python train.py --config_name=config/AConvNet-SOC.json
It is giving MemoryError.
The publisher of above repository is using 32GB RAM and 11 GB GPU. But I have 8 GB RAM and 8GB GPU.
Here is what I have done:
I thought of running it without cache. like,
python train.py --config_name=config/AConvNet-SOC.json --no-cache-dir
But it is throwing below error
FATAL Flags parsing error: Unknown command line flag 'no-cache-dir' Pass --helpshort or --helpfull to see help on flags.
I think it is because no-cache-dir argument is not defined in it by using absl.flags. Does python supports using no chache directory implementation
I am able to solve it by decreasing the number of epochs and batch_size. But I want to run it for full epochs.
Using zeo_grad() of Pytorch makes the gradients zero for every minibatch, so that GPU won't run out of memory. But it is already used in the code I am running in _base.py. Is there anyway I can leverage more of this.
How to resolve this.
I am training a 3D siamese network in PyTorch. When I run the code from an iPython (v7.15.0) terminal the GPU RAM usage maxes out at 1739M:
When I copy the same code into a Jupyter notebook (in Jupyter Lab v2.1.5) the GPU RAM usage is 10209M:
Jupyter Lab was run from the terminal in the same Python virtual environment.
First, I don't understand why running the script in Jupyter Lab would increase GPU RAM usage by a factor of almost 6.
Second, and related, is there anyway to have Jupyter Lab run in a mode that uses somewhere in the range of 1739M for the GPU RAM? I love the ability to have all the "documentation" around the code and output.
Python version 3.6.9.
Ok, now I realized what the difference was when I run the two.
I have two GPUs on the machine a Quadro M2000 that drives the video and the Titan XP. When I ran JupyterLab on the command line I ran it as jupyter lab, but when I ran iPython I ran it as CUDA_VISIBLE_DEVICES=0 ipython --pylab. Without the CUDA_VISIBILE_DEVICES it gave me warnings about mis-matched GPUS. When I ran in the iPython terminal I saw the warnings previously, but I didn't see them when I ran JupyterLab.
So, it is still odd that the RAM usage on the Titan XP would jump to 10G+ of RAM.
I've just installed a new GPU (RTX 2070) in my machine alongside the old GPU. I wanted to see if PyTorch picked up it, so following the instructions here: How to check if pytorch is using the GPU?, I ran the following commands (Python3.6.9, Linux Mint Tricia 19.3)
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
Killed
>>> torch.cuda.get_device_name(0)
Killed
Both of the two Killed processes took some time and one of them froze the machine for half a minute or so. Does anyone have any experience with this? Are there some setup steps I'm missing?
If I understand correctly, you would like to list the available cuda devices. This can be done via nvidia-smi (not a PyTorch function), and both your old GPU and the RTX 2070 should show up, as devices 0 and 1. In PyTorch, if you want to pass data to one specific device, you can do device = torch.device("cuda:0") for GPU 0 and device = torch.device("cuda:1") for GPU 1. While running, you can do nvidia-smi to check the memory usage & running processes for each GPU.
To anyone seeing this down the line, whilst I had the nvidia driver set up I needed to get a couple of other things set up, such as CUDA and the CuDNN toolbox. The best article I found on the subject was https://hackernoon.com/up-and-running-with-ubuntu-nvidia-cuda-cudnn-tensorflow-and-pytorch-a54ec2ec907d.
I have installed Keras with gpu support in R based on Tensorflow with gpu support. This is installed with these steps:
https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781
If I run the Bosting housing example code from the book Deep learning with R, I receive this screen:
Can I conclude that the code runs on the GPU?
Or is this line from the picture above giving an error:
GPU libraries are statically linked, skip dlopen check.
During running the code the GPU is running only on 3% of capacity while the CPU is running on 20-25%.
The code is NOT running faster than while I initially did run the code without installing GPU support.
Thank you!
Yes, tensorflow is running with GPU enabled. Boston Housing is a relatively small dataset and probably does not benefit from using the GPU to a large degree. The lines below indicate it is running on the GPU. "Created tensorflow device (/job:localhost/replica:0/task:0device:GPU:0".
From the guide at Tensorflow
You can set tf.debugging.set_log_device_placement(True) in order to explicitly see where each operation is running. THE R equivalent is below.
library(tensorflow)
tf$debugging$set_log_device_placement(TRUE)
I want to train a 5 Layer DNN using Tensorflow on Jupyter Notebook. It perform well on normal training.
But when I want to use Cross validation to find a great dropout rate. When training process, Jupyter say the kernel is dead.
The Jupyter log:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
My code is here.
I Google find out maybe it's because run out of memory. I try to reduce batch size and the error still occurred.
The code running on Ubuntu 16.04 and 32GB RAM with GPU 1080Ti. Enviroment are Python(3.5), tensorflow (1.3.0) & tensorflow-gpu (1.3.0).