I am creating checkpoints, so I can resume training again.
checkpoint = ModelCheckpoint('model.h5', monitor='val_loss', verbose=1, save_best_only=True, mode='min')
but when I tried to resume training, loading model.h5 is very slow.
from keras.models import load_model
model = load_model('model.h5',custom_objects={'GroupNormalization' : GroupNormalization},compile=False)
Is there a way to solve this?
.h5 extension is the one of the fastest of loading any large files. There can be couple of points to note while loading weights
Are you using normal HDD?
Are you using GPU's?
If not GPU then loading into RAM then
Loading & unloading operation is CPU intensive work if older processor might take time to load
Save a model using ModelCheckpoint without save_weights_only=True will save the optimizer state as well. You probably notice that the saved file size is much bigger than files with just weight.
Bigger files are slower to load, especially with slow CPU. Colab use 1 core CPU on GPU instances so it really slow.
If you, for now only want to resume your training then use save_weights_only=True and on resuming, create model and use model.load_weight should be faster. But note that the optimizer will got reset.
Related
I am using a subset of the PlantVillage (image) dataset on my Google drive and trying to train CNN models on that data from Google Colab (and of course, I use GPU). The problem is, the first epoch of training goes very slowly because the data is being loaded into the GPU for the first time. the later rounds move much faster and in a predictable frame of time. Now, is this possible to do the loading prior to the training and excluded from it? I want to %%time my training time and having this extra loading time in my training messes things up.
I use Tensorflow and Keras applications for data preprocessing and model training.
You can use Dataset.cache() and Dataset.prefetch() which will keep the data in memory after loading from disk and will increase the model training speed comparatively.
Check the below code:
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
Please have a look at this link for your reference.
I am trying to train an fairly complex GCN network on my 10GB GPU. It runs smoothly until epoch 87 but then the spyder kernel restarts. Is it because of memory issue, if so how can I handle it.
As you mentioned, If the model is too large it good to store logs after every epoch.
## after every epoch
path = os.path.join(SAVE_DIR, 'model.pth')
torch.save(TheModelClass.cpu().state_dict(), path) # saving model
MODEL.cuda() # moving model to GPU for further training
## if the kernel terminates, load the model paramters
device = torch.device("cuda")
model = TheModelClass()
model.load_state_dict(torch.load(PATH))
model.train()
model.to(device)
so, anything happens at the process you can start from the last completed epoch.
from your information, it's hard to tell what is the causes to terminate the kernel exactly. RAM overloading is less likely because of GPU acceleration and using pytorch framework. but it could be.
However, the above solution will help you anywhere.
How can I train an XGBoost model on a GPU but run predictions on CPU without allocating any GPU RAM?
My situation: I create an XGBoot model (tree_method='gpu_hist') in Python with predictor='cpu_predictor', then I train it on GPU, then I save (pickle) it to disk, then I read the model from disk, then I use it for predictions.
My problem: once the model starts doing predictions, even though I run it on CPU, it still allocates some small amount of GPU RAM (around ~289MB). This is a problem for the following reasons:
I run multiple copies of the model to parallelize predictions and if I run too many, the prediction processes crash.
I can not use GPU for training other models, if I run predictions on the same machine at the same time.
So, how can one tell XGBoost to not allocate any GPU RAM and use CPU and regular RAM only for predictions?
Thank you very much for your help!
I have a huge Tensorflow model (the checkpoint file is 4-5 gbs). I was wondering if there's a different way to save Tensorflow models, besides the checkpoint way, that is space/memory efficient.
I know that a checkpoint file also saves all the optimizer gradients, so maybe those can be cut out too.
My model is very simple, just two matrices of embeddings, perhaps I can only save those matrices to .npy directly?
What you want to do with the checkpoint is to freeze it. Check out this page from tensorflow's official documentation.
The freezing process strips off all extraneous information from the checkpoint that isn't used for forward inference. Tensorflow provides an easy to use script for it called freeze_graph.py.
I am using keras-rl to train my network with the D-DQN algorithm. I am running my training on the GPU with the model.fit_generator() function to allow data to be sent to the GPU while it is doing backprops. I suspect the generation of data to be too slow compared to the speed of processing data by the GPU.
In the generation of data, as instructed in the D-DQN algorithm, I must first predict Q-values with my models and then use these values for the backpropagation. And if the GPU is used to run these predictions, it means that they are breaking the flow of my data (I want backprops to run as often as possible).
Is there a way I can specify on which device to run specific operations? In a way that I could run the predictions on the CPU and the backprops on the GPU.
Maybe you can save the model at the end of the training. Then start another python file and write os.environ["CUDA_VISIBLE_DEVICES"]="-1"before you import any keras or tensorflow stuff. Now you should be able to load the model and make predictions with your CPU.
It's hard to properly answer your question without seeing your code.
The code below shows how you can list the available devices and force tensorflow to use a specific device.
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
get_available_devices()
with tf.device('/gpu:0'):
//Do GPU stuff here
with tf.device('/cpu:0'):
//Do CPU stuff here