How can I train an XGBoost model on a GPU but run predictions on CPU without allocating any GPU RAM?
My situation: I create an XGBoot model (tree_method='gpu_hist') in Python with predictor='cpu_predictor', then I train it on GPU, then I save (pickle) it to disk, then I read the model from disk, then I use it for predictions.
My problem: once the model starts doing predictions, even though I run it on CPU, it still allocates some small amount of GPU RAM (around ~289MB). This is a problem for the following reasons:
I run multiple copies of the model to parallelize predictions and if I run too many, the prediction processes crash.
I can not use GPU for training other models, if I run predictions on the same machine at the same time.
So, how can one tell XGBoost to not allocate any GPU RAM and use CPU and regular RAM only for predictions?
Thank you very much for your help!
Related
If there's a 8G-RAM GPU, and has loaded a model that takes all the 8G RAM, is it possible to run multiple model prediction/inference in parallel?
or you can only run a prediction at a same time period
If your single model uses all 8gb of RAM, then it would not be possible to run another model in parallel using the same resources. You would have to allocate more memory or schedule the second model to run afterwards.
If I have a single GPU with 8GB RAM and I have a TensorFlow model (excluding training/validation data) that is 10GB, can TensorFlow train the model?
If yes, how does TensorFlow do this?
Notes:
I'm not looking for distributed GPU training. I want to know about single GPU case.
I'm not concerned about the training/validation data sizes.
No you can not train a model larger than your GPU's memory. (there may be some ways with dropout that I am not aware of but in general it is not advised). Further you would need more memory than even all the parameters you are keeping because your GPU needs to retain the parameters along with the derivatives for each step to do back-prop.
Not to mention the smaller batch size this would require as there is less space left for the dataset.
From what I see, most people seem to be initializing an entire model, and sending the whole thing to the GPU. But I have a neural net model that is too big to fit entirely on my GPU. Is it possible to keep the model saved in ram, but run all the operations on the GPU?
I do not believe this is possible. However, one easy work around would be to split you model into sections that will fit into gpu memory along with your batch input.
Send the first part(s) of the model to gpu and calculate outputs
Release the former part of the model from gpu memory, and send the next section of the model to the gpu.
Input the output from 1 into the next section of the model and save outputs.
Repeat 1 through 3 until you reach your models final output.
I am using keras-rl to train my network with the D-DQN algorithm. I am running my training on the GPU with the model.fit_generator() function to allow data to be sent to the GPU while it is doing backprops. I suspect the generation of data to be too slow compared to the speed of processing data by the GPU.
In the generation of data, as instructed in the D-DQN algorithm, I must first predict Q-values with my models and then use these values for the backpropagation. And if the GPU is used to run these predictions, it means that they are breaking the flow of my data (I want backprops to run as often as possible).
Is there a way I can specify on which device to run specific operations? In a way that I could run the predictions on the CPU and the backprops on the GPU.
Maybe you can save the model at the end of the training. Then start another python file and write os.environ["CUDA_VISIBLE_DEVICES"]="-1"before you import any keras or tensorflow stuff. Now you should be able to load the model and make predictions with your CPU.
It's hard to properly answer your question without seeing your code.
The code below shows how you can list the available devices and force tensorflow to use a specific device.
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
get_available_devices()
with tf.device('/gpu:0'):
//Do GPU stuff here
with tf.device('/cpu:0'):
//Do CPU stuff here
I trained a network and saved the model as a hdf5 file. I am using the Keras library for training. But the file is very large ~ 3.6GB. I am unable to continue the training process, because the program throws an "Out of memory" error. I reduced the batch size from 128 to 8, and the same error is thrown. I would like to retrain the existing model when I encounter new data/Want to train for a few more epochs.
Using 12GB GPU RAM, 65GB CPU RAM.
2GB training images, 1GB training labels.
Any way to counter this?