I want to run PyTorch using cuda. I set model.cuda() and torch.cuda.LongTensor() for all tensors.
Do I have to create tensors using .cuda explicitly if I have used model.cuda()?
Is there a way to make all computations run on GPU by default?
I do not think you can specify that you want to use cuda tensors by default.
However you should have a look to the pytorch offical examples.
In the imagenet training/testing script, they use a wrapper over the model called DataParallel.
This wrapper has two advantages:
it handles the data parallelism over multiple GPUs
it handles the casting of cpu tensors to cuda tensors
As you can see in L164, you don't have to cast manually your inputs/targets to cuda.
Note that, if you have multiple GPUs and you want to use a single one, launch any python/pytorch scripts with the CUDA_VISIBLE_DEVICES prefix. For instance CUDA_VISIBLE_DEVICES=0 python main.py.
Yes. You can set the default tensor type to cuda with:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
Do I have to create tensors using .cuda explicitly if I have used model.cuda()?
Yes, you need to not only set your model [parameter] tensors to cuda, but also those of the data features and targets (and any other tensors used by the model).
Related
I would like to figure out, whether the PyTorch model is on cpu or cuda in order to
initialize some other variable as Torch.Tensor or Torch.cuda.Tensor depending on the model.
However, looking at the output of the dir() function I see only .cpu(), .cuda(), to() methods which put the model on device, GPU or other device, specified in to. For PyTorch tensor there is is_cuda attribute, but no analogue for the whole model.
Is there some way to deduce this for a model, or one needs to refer to a particular weight?
No, there is no such function for nn.Module, I believe this is because parameters could be on multiple devices at the same time.
If you're working with a single device, a workaround is to check the first parameter:
next(model.parameters()).is_cuda
As described here.
I've tried to load the pretrained model from one article, but i can't do this becuse i have one GPU-system, but in model it is is explicitly set to use gpu:0 and gpu:1. What can I do to load this model on my pc?
I have ubuntu, python3.7, cuda10, tensorflow 2.0
You need to use gpu:0 only at every place since it has only one gpu. Also, as the code is written to use mutiple GPUs, it needs to be changed at every place where it is configured for multi level processing (data or function). Please refer below for more information on GPU usage:
https://jhui.github.io/2017/03/07/TensorFlow-GPU/
So in TensorFlow's guide for using GPUs there is a part about using multiple GPUs in a "multi-tower fashion":
...
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d): # <---- manual device placement
...
Seeing this, one might be tempted to leverage this style for multiple GPU training in a custom Estimator to indicate to the model that it can be distributed across multiple GPUs efficiently.
To my knowledge, if manual device placement is absent TensorFlow does not have some form of optimal device mapping (expect perhaps if you have the GPU version installed and a GPU is available, using it over the CPU). So what other choice do you have?
Anyway, you carry on with training your estimator and export it to a SavedModel via estimator.export_savedmodel(...) and wish to use this SavedModel later... perhaps on a different machine, one which may not have as many GPUs as the device on which the model was trained (or maybe no GPUs)
so when you run
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(model_dir)
you get
Cannot assign a device for operation <OP-NAME>. Operation was
explicitly assigned to <DEVICE-NAME> but available devices are
[<AVAILABLE-DEVICE-0>,...]
An older S.O. Post suggests that changing device placement was not possible... but hopefully over time things have changed.
Thus my question is:
when loading a SavedModel can I change the device placement to be appropriate for the device it is loaded on. E.g. if I train a model with 6 GPUs and a friend wants to run it at home with their e-GPU, can they set '/device:GPU:1' through '/device:GPU:5' to '/device:GPU:0'?
if 1 is not possible, is there a (painless) way for me, in the custom Estimator's model_fn, to specify how to generically distribute a graph?
e.g.
with tf.device('available-gpu-3')
where available-gpu-3 is the third available GPU if there are three or more GPUs, otherwise the second or first available GPU, and if no GPU it is CPU
This matters because if there is a shared machine with is training two models, say one model on '/device:GPU:0' then the other model is trained explicitly on GPUs 1 and 2... so on another 2 GPU machine, GPU 2 will not be available....
I am doing some research on this topic recently and to my knowledge, your question 1 can work only if you clear all devices when you export the model in the original tensorflow code, with flag clear_devices=True.
In my own code, it looks like
builder = tf.saved_model.builder.SavedModelBuilder('osvos_saved')
builder.add_meta_graph_and_variables(sess, ['serve'], clear_devices=True)
builder.save()
If you only have a exported model, seems not possible. You can refer to this issue.
I'm currently trying to find a way to fix this, as stated in my stackoverflow question. Hope the workaround can help you.
Is there a way to reliably enable CUDA on the whole model?
I want to run the training on my GPU. I found on some forums that I need to apply .cuda() on anything I want to use CUDA with (I've applied it to everything I could without making the program crash). Surprisingly, this makes the training even slower.
Then, I found that you could use this torch.set_default_tensor_type('torch.cuda.FloatTensor') to use CUDA. With both enabled, nothing changes. What is happening?
You can use the tensor.to(device) command to move a tensor to a device.
The .to() command is also used to move a whole model to a device, like in the post you linked to.
Another possibility is to set the device of a tensor during creation using the device= keyword argument, like in t = torch.tensor(some_list, device=device)
To set the device dynamically in your code, you can use
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
to set cuda as your device if possible.
There are various code examples on PyTorch Tutorials and in the documentation linked above that could help you.
With both enabled, nothing changes.
That is because you have already set every tensor to GPU.
Is there a way to reliably enable CUDA on the whole model?
model.to('cuda')
I've applied it to everything I could
You only need to apply it to tensors the model will be interacting with, generally:
the model's pramaters model.to('cuda')
the features data features = features.to('cuda')
the target data targets = targets.to('cuda')
I am trying to train a slim model using 3 GPUs.
I specifically telling TF to use the second GPU to allocate the model:
with tf.device('device:GPU:1'):
logits, end_points = inception_v3(inputs)
However, I'm getting an OOM error on that GPU everytime I run my code. I've tried to reduce the batch_size so the model fits in memory, but the net is ruinned.
I own 3 GPUS so, is there a way to tell TF to use my third GPU when second is full? I've tried not telling TF to use any GPU and allowing soft placemente, but it is not working either.
This statement with tf.device('device:GPU:1') tells tensorflow specifically to use GPU-1, so it won't attempt to use any other device you have.
When the model is too big, the recommended way is to use model parallelism via manually splitting your graph into different GPUs. The complication in your case is that the model definition is in the library, so you can't insert tf.device statements for different layers unless you patch tensorflow.
But there is a workaround
You can define and place variables before invoking inception_v3 builder. This way inception_v3 will reuse these variables and not change its placement. Example:
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
with tf.device('device:GPU:1'):
tf.get_variable("InceptionV3/Logits/Conv2d_1c_1x1/biases", shape=[1000])
tf.get_variable("InceptionV3/Logits/Conv2d_1c_1x1/weights", shape=[1, 1, 2048, 1000])
with tf.device('device:GPU:0'):
logits, end_points = inception_v3(inputs)
Upon running, you'll see that all variables except Conv2d_1c_1x1 are placed onto GPU-0, while Conv2d_1c_1x1 layer is on GPU-1.
The drawback is that you need to know the shape of each variable you want to replace. But it is doable and at least can get your model running.