How to use multiple GPUs in pytorch?

How to use multiple GPUs in pytorch? - python

I use this command to use a GPU.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
But, I want to use two GPUs in jupyter, like this:
device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu")

Assuming that you want to distribute the data across the available GPUs (If you have batch size of 16, and 2 GPUs, you might be looking providing the 8 samples to each of the GPUs), and not really spread out the parts of models across difference GPU's. This can be done as follows:
If you want to use all the available GPUs:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CreateModel()
model= nn.DataParallel(model)
model.to(device)
If you want to use specific GPUs:
(For example, using 2 out of 4 GPUs)
device = torch.device("cuda:1,3" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.
model = CreateModel()
model= nn.DataParallel(model,device_ids = [1, 3])
model.to(device)
To use the specific GPU's by setting OS environment variable:
Before executing the program, set CUDA_VISIBLE_DEVICES variable as follows:
export CUDA_VISIBLE_DEVICES=1,3 (Assuming you want to select 2nd and 4th GPU)
Then, within program, you can just use DataParallel() as though you want to use all the GPUs. (similar to 1st case). Here the GPUs available for the program is restricted by the OS environment variable.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CreateModel()
model= nn.DataParallel(model)
model.to(device)
In all of these cases, the data has to be mapped to the device.
If X and y are the data:
X.to(device)
y.to(device)

Using multi-GPUs is as simply as wrapping a model in DataParallel and increasing the batch size. Check these two tutorials for a quick start:
Multi-GPU Examples
Data Parallelism

Another option would be to use some helper libraries for PyTorch:
PyTorch Ignite library Distributed GPU training
In there there is a concept of context manager for distributed configuration on:
nccl - torch native distributed configuration on multiple GPUs
xla-tpu - TPUs distributed configuration
PyTorch Lightning Multi-GPU training
This is of possible the best option IMHO to train on CPU/GPU/TPU without changing your original PyTorch code.
Worth cheking Catalyst for similar distributed GPU options.

In 2022, PyTorch says:
It is recommended to use DistributedDataParallel, instead of this class, to do multi-GPU training, even if there is only a single node. See: Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel and Distributed Data Parallel.
in https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel
Thus, seems that we should use DistributedDataParallel, not DataParallel.

When I ran naiveinception_googlenet, the above methods didn't work for me. The following method solved my problem.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,3" # specify which GPU(s) to be used

If you want to run your code only on specific GPUs (e.g. only on GPU id 2 and 3), then you can specify that using the CUDA_VISIBLE_DEVICES=2,3 variable when triggering the python code from terminal.
CUDA_VISIBLE_DEVICES=2,3 python lstm_demo_example.py --epochs=30 --lr=0.001
and inside the code, leave it as:
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
model = LSTMModel()
model = nn.DataParallel(model)
model = model.to(device)
Source : https://glassboxmedicine.com/2020/03/04/multi-gpu-training-in-pytorch-data-and-model-parallelism/

Related

RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu"

Earlier I have configured the following project
https://github.com/zllrunning/face-makeup.PyTorch
using Pytorch with CUDA=10.2, Now Pytorch with CUDA=10.2 support is not available for Windows.
So, when I am configuring the same project using Pytorch with CUDA=11.3, then I am getting the following error:
RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.
Please help me in solving this problem.

I solved this by adding map_location=lambda storage, loc: storage.cuda() in the model_zoo.load_url method. I think in torch 1.12 they have changed the default location from GPU to CPU (which does not make any sense).
Edit:
In the file resnet.py, under function def init_weight(self):, the following line
state_dict = modelzoo.load_url(resnet18_url)
is changed with
state_dict = modelzoo.load_url(resnet18_url, map_location=lambda storage, loc: storage.cuda())

not specifically related to resnet , but below code should work for majority of issues related to " dont know how to restore data location of torch.storage._untype storage ( tagged with gpu)
import torch
import whisper
devices = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = whisper.load_model("medium" , device =devices)

only first gpu is allocated (eventhough I make other gpus visible, in pytorch cuda framework)

I am using cuda in pytorch framwework in linux server with multiple cuda devices.
The problem is that
eventhough I specified certain gpus that can be shown,
the program keeps using only first gpu.
(But other program works fine and other specified gpus are allocated well.
because of that, I think it is not nvidia or system problem.
nvidia-smi shows all gpus well and there's no problem.
I didn't have problem with allocating gpus with below codes before (except when the system is not working)
)
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBILE_DEVICES"] = str(args.gpu)
I wrote that before running main function.
and it works fine for other programs in same system.
I printed that args.gpu variable, and could see that the value is not "0".

Have you tried something like this?
device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.
model = CreateModel()
model= nn.DataParallel(model,device_ids = [0, 1])
model.to(device)
let me know about this

TensorFlow v1.10+ load SavedModel with different device placement or manually set dynamic device placement?

So in TensorFlow's guide for using GPUs there is a part about using multiple GPUs in a "multi-tower fashion":
...
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d): # <---- manual device placement
...
Seeing this, one might be tempted to leverage this style for multiple GPU training in a custom Estimator to indicate to the model that it can be distributed across multiple GPUs efficiently.
To my knowledge, if manual device placement is absent TensorFlow does not have some form of optimal device mapping (expect perhaps if you have the GPU version installed and a GPU is available, using it over the CPU). So what other choice do you have?
Anyway, you carry on with training your estimator and export it to a SavedModel via estimator.export_savedmodel(...) and wish to use this SavedModel later... perhaps on a different machine, one which may not have as many GPUs as the device on which the model was trained (or maybe no GPUs)
so when you run
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(model_dir)
you get
Cannot assign a device for operation <OP-NAME>. Operation was
explicitly assigned to <DEVICE-NAME> but available devices are
[<AVAILABLE-DEVICE-0>,...]
An older S.O. Post suggests that changing device placement was not possible... but hopefully over time things have changed.
Thus my question is:
when loading a SavedModel can I change the device placement to be appropriate for the device it is loaded on. E.g. if I train a model with 6 GPUs and a friend wants to run it at home with their e-GPU, can they set '/device:GPU:1' through '/device:GPU:5' to '/device:GPU:0'?
if 1 is not possible, is there a (painless) way for me, in the custom Estimator's model_fn, to specify how to generically distribute a graph?
e.g.
with tf.device('available-gpu-3')
where available-gpu-3 is the third available GPU if there are three or more GPUs, otherwise the second or first available GPU, and if no GPU it is CPU
This matters because if there is a shared machine with is training two models, say one model on '/device:GPU:0' then the other model is trained explicitly on GPUs 1 and 2... so on another 2 GPU machine, GPU 2 will not be available....

I am doing some research on this topic recently and to my knowledge, your question 1 can work only if you clear all devices when you export the model in the original tensorflow code, with flag clear_devices=True.
In my own code, it looks like
builder = tf.saved_model.builder.SavedModelBuilder('osvos_saved')
builder.add_meta_graph_and_variables(sess, ['serve'], clear_devices=True)
builder.save()
If you only have a exported model, seems not possible. You can refer to this issue.
I'm currently trying to find a way to fix this, as stated in my stackoverflow question. Hope the workaround can help you.

Problems about torch.nn.DataParallel

I am new in deep learning area. Now I am reproducing a paper’s codes. since they use several GPUs, there is a command torch.nn.DataParallel(model, device_ids= args.gpus).cuda() in codes. But I only have one GPU, what
should I change this code to match up my GPU?
Thank you!

DataParallel should work on a single GPU as well, but you should check if args.gpus only contains the id of the device that is to be used (should be 0) or None.
Choosing None will make the module use all available devices.
Also you could remove DataParallel as you do not need it and move the model to GPU only by calling model.cuda() or, as I prefer, model.to(device) where device is the device's name.
Example:
This example shows how to use a model on a single GPU, setting the device using .to() instead of .cuda().
from torch import nn
import torch
# Set device to cuda if cuda is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create model
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
# moving model to GPU
model.to(device)
If you want to use DataParallel you could do it like this
# Optional DataParallel, not needed for single GPU usage
model1 = torch.nn.DataParallel(model, device_ids=[0]).to(device)
# Or, using default 'device_ids=None'
model1 = torch.nn.DataParallel(model).to(device)

Using CUDA with pytorch?

Is there a way to reliably enable CUDA on the whole model?
I want to run the training on my GPU. I found on some forums that I need to apply .cuda() on anything I want to use CUDA with (I've applied it to everything I could without making the program crash). Surprisingly, this makes the training even slower.
Then, I found that you could use this torch.set_default_tensor_type('torch.cuda.FloatTensor') to use CUDA. With both enabled, nothing changes. What is happening?

You can use the tensor.to(device) command to move a tensor to a device.
The .to() command is also used to move a whole model to a device, like in the post you linked to.
Another possibility is to set the device of a tensor during creation using the device= keyword argument, like in t = torch.tensor(some_list, device=device)
To set the device dynamically in your code, you can use
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
to set cuda as your device if possible.
There are various code examples on PyTorch Tutorials and in the documentation linked above that could help you.

With both enabled, nothing changes.
That is because you have already set every tensor to GPU.
Is there a way to reliably enable CUDA on the whole model?
model.to('cuda')
I've applied it to everything I could
You only need to apply it to tensors the model will be interacting with, generally:
the model's pramaters model.to('cuda')
the features data features = features.to('cuda')
the target data targets = targets.to('cuda')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.