Is there a way to reliably enable CUDA on the whole model?
I want to run the training on my GPU. I found on some forums that I need to apply .cuda() on anything I want to use CUDA with (I've applied it to everything I could without making the program crash). Surprisingly, this makes the training even slower.
Then, I found that you could use this torch.set_default_tensor_type('torch.cuda.FloatTensor') to use CUDA. With both enabled, nothing changes. What is happening?
You can use the tensor.to(device) command to move a tensor to a device.
The .to() command is also used to move a whole model to a device, like in the post you linked to.
Another possibility is to set the device of a tensor during creation using the device= keyword argument, like in t = torch.tensor(some_list, device=device)
To set the device dynamically in your code, you can use
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
to set cuda as your device if possible.
There are various code examples on PyTorch Tutorials and in the documentation linked above that could help you.
With both enabled, nothing changes.
That is because you have already set every tensor to GPU.
Is there a way to reliably enable CUDA on the whole model?
model.to('cuda')
I've applied it to everything I could
You only need to apply it to tensors the model will be interacting with, generally:
the model's pramaters model.to('cuda')
the features data features = features.to('cuda')
the target data targets = targets.to('cuda')
Related
I have been stuck at trying to train my PyTorch model in GPU. The model perfectly works in CPU though. I have been using Google Colab's GPU resources for using cuda.
I know that in order to run a model in GPU, the 'model', 'input features' and 'target' needs to be in 'cuda' device.
But, no matter what I do in my code, I either keep getting the error:
RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu
OR
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Here is my notebook:
https://colab.research.google.com/drive/1rviS_4hmdzPQUncZyi8FsRH7y3jL0isQ
It would be really helpful if someone could let me exactly which variables to be moved using .to('cuda')
Additionally, explanations/suggestions for ensuring that this does not recur in the future would be highly appreciated. Thank you !
Your self.hidden is a tuple of torch.tensors. PyTorch doesn't automatically move these kind of tensor to GPU when .to(device) is invoked on your model.
You can either:
Implement your own to(self, type, device) method for your BiLSTM_CRF class. (Not recommended).
Make self.hidden a registered buffer. This way all methods of nn.Module such as .to(), .float(), etc. will also be applied to self.hidden.
first you have configure the device you want to use, if you are on GPU change it to CPU and the reverse is also true
I would like to figure out, whether the PyTorch model is on cpu or cuda in order to
initialize some other variable as Torch.Tensor or Torch.cuda.Tensor depending on the model.
However, looking at the output of the dir() function I see only .cpu(), .cuda(), to() methods which put the model on device, GPU or other device, specified in to. For PyTorch tensor there is is_cuda attribute, but no analogue for the whole model.
Is there some way to deduce this for a model, or one needs to refer to a particular weight?
No, there is no such function for nn.Module, I believe this is because parameters could be on multiple devices at the same time.
If you're working with a single device, a workaround is to check the first parameter:
next(model.parameters()).is_cuda
As described here.
I am new to Pytorch, but it seems pretty nice. My only question was when to use tensor.to(device) or Module.nn.to(device).
I was reading the documentation on this topic, and it indicates that this method will move the tensor or model to the specified device. But I was not clear for what operations this is necessary, and what kind of errors I will get if I don't use .to() at the right time?
For example, if I just create a tensor, I imagine that the tensor is stored in CPU accessible memory until I move the tensor to the GPU. Once the tensor is on the GPU, then the GPU will execute any mathematical operations on that tensor.
However, do I have to worry about accidentally transferring the data tensor to the GPU while not transferring the model to the GPU? Will this just give me straight errors, or will it engage in a lot of expensive data transfer behind the scenes. This example is easy enough for me to test, but I was just wondering about other cases where it might not be so obvious.
Any guidance would be helpful.
It is necessary to have both the model, and the data on the same device, either CPU or GPU, for the model to process data. Data on CPU and model on GPU, or vice-versa, will result in a Runtime error.
You can set a variable device to cuda if it's available, else it will be set to cpu, and then transfer data and model to device :
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)
data = data.to(device)
So in TensorFlow's guide for using GPUs there is a part about using multiple GPUs in a "multi-tower fashion":
...
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d): # <---- manual device placement
...
Seeing this, one might be tempted to leverage this style for multiple GPU training in a custom Estimator to indicate to the model that it can be distributed across multiple GPUs efficiently.
To my knowledge, if manual device placement is absent TensorFlow does not have some form of optimal device mapping (expect perhaps if you have the GPU version installed and a GPU is available, using it over the CPU). So what other choice do you have?
Anyway, you carry on with training your estimator and export it to a SavedModel via estimator.export_savedmodel(...) and wish to use this SavedModel later... perhaps on a different machine, one which may not have as many GPUs as the device on which the model was trained (or maybe no GPUs)
so when you run
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(model_dir)
you get
Cannot assign a device for operation <OP-NAME>. Operation was
explicitly assigned to <DEVICE-NAME> but available devices are
[<AVAILABLE-DEVICE-0>,...]
An older S.O. Post suggests that changing device placement was not possible... but hopefully over time things have changed.
Thus my question is:
when loading a SavedModel can I change the device placement to be appropriate for the device it is loaded on. E.g. if I train a model with 6 GPUs and a friend wants to run it at home with their e-GPU, can they set '/device:GPU:1' through '/device:GPU:5' to '/device:GPU:0'?
if 1 is not possible, is there a (painless) way for me, in the custom Estimator's model_fn, to specify how to generically distribute a graph?
e.g.
with tf.device('available-gpu-3')
where available-gpu-3 is the third available GPU if there are three or more GPUs, otherwise the second or first available GPU, and if no GPU it is CPU
This matters because if there is a shared machine with is training two models, say one model on '/device:GPU:0' then the other model is trained explicitly on GPUs 1 and 2... so on another 2 GPU machine, GPU 2 will not be available....
I am doing some research on this topic recently and to my knowledge, your question 1 can work only if you clear all devices when you export the model in the original tensorflow code, with flag clear_devices=True.
In my own code, it looks like
builder = tf.saved_model.builder.SavedModelBuilder('osvos_saved')
builder.add_meta_graph_and_variables(sess, ['serve'], clear_devices=True)
builder.save()
If you only have a exported model, seems not possible. You can refer to this issue.
I'm currently trying to find a way to fix this, as stated in my stackoverflow question. Hope the workaround can help you.
I'm trying to run the CIFAR10 tutorial with the training code on one gpu and the eval code on the other. I know for sure I have two gpus on my computer, and I can test this by running the simple examples here: https://www.tensorflow.org/how_tos/using_gpu/index.html
However, using a with device('/gpu:0') does not work for most variables in the CIFAR example. I tried a whole lot of combinations of different variables on gpu vs. cpu, or all the variables on one or the other. Always the same error for some variable, something like this:
Cannot assign a device to node 'shuffle_batch/random_shuffle_queue': Could not satisfy explicit device specification '/gpu:0'
Is this possibly a bug in Tensor Flow or am I missing something?
Could not satisfy explicit device specification means you do not have the corresponding device. Do you actually have a CUDA-enabled GPU on your machine?
UPDATE: As it turned out in the discussion below, this error is also raised if the particular operation (in this case, RandomShuffleQueue) cannot be executed on the GPU, because it only has a CPU implementation.
If you are fine with TensorFlow choosing a device for you (particularly, falling back to CPU when no GPU implementation is available), consider setting allow_soft_placement in your configuration, as per this article:
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))