Problems about torch.nn.DataParallel - python

I am new in deep learning area. Now I am reproducing a paper’s codes. since they use several GPUs, there is a command torch.nn.DataParallel(model, device_ids= args.gpus).cuda() in codes. But I only have one GPU, what
should I change this code to match up my GPU?
Thank you!

DataParallel should work on a single GPU as well, but you should check if args.gpus only contains the id of the device that is to be used (should be 0) or None.
Choosing None will make the module use all available devices.
Also you could remove DataParallel as you do not need it and move the model to GPU only by calling model.cuda() or, as I prefer, model.to(device) where device is the device's name.
Example:
This example shows how to use a model on a single GPU, setting the device using .to() instead of .cuda().
from torch import nn
import torch
# Set device to cuda if cuda is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create model
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
# moving model to GPU
model.to(device)
If you want to use DataParallel you could do it like this
# Optional DataParallel, not needed for single GPU usage
model1 = torch.nn.DataParallel(model, device_ids=[0]).to(device)
# Or, using default 'device_ids=None'
model1 = torch.nn.DataParallel(model).to(device)

Related

please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU

# set the computation device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load model checkpoint
checkpoint = 'checkpoints/checkpoint_ssd300.pth.tar'
checkpoint = torch.load(checkpoint)
start_epoch = checkpoint['epoch'] + 1
print('\nLoaded checkpoint from epoch %d.\n' % start_epoch)
model = checkpoint['model']
model = model.to(device)
model.eval()
When I try to run this code block, I get the following problem:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
The error message is indicating that you are trying to load a model checkpoint that was trained on a GPU (CUDA device), but your current machine does not have a GPU or CUDA is not available.
The line device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') is trying to determine whether CUDA is available on the current machine, and if it is, it sets the device variable to 'cuda', otherwise it sets it to 'cpu'.
The line checkpoint = torch.load(checkpoint) is trying to load the model checkpoint from the specified file, but it is trying to do so on the 'cuda' device, which is causing the error.
To resolve this issue, you can use the map_location argument of the torch.load function to specify that the model should be loaded on the 'cpu' device, instead of the 'cuda' device.
checkpoint = torch.load(checkpoint, map_location=torch.device('cpu'))
This way the model will be loaded on the CPU device, even if a CUDA device was used to train it.

RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu"

Earlier I have configured the following project
https://github.com/zllrunning/face-makeup.PyTorch
using Pytorch with CUDA=10.2, Now Pytorch with CUDA=10.2 support is not available for Windows.
So, when I am configuring the same project using Pytorch with CUDA=11.3, then I am getting the following error:
RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.
Please help me in solving this problem.
I solved this by adding map_location=lambda storage, loc: storage.cuda() in the model_zoo.load_url method. I think in torch 1.12 they have changed the default location from GPU to CPU (which does not make any sense).
Edit:
In the file resnet.py, under function def init_weight(self):, the following line
state_dict = modelzoo.load_url(resnet18_url)
is changed with
state_dict = modelzoo.load_url(resnet18_url, map_location=lambda storage, loc: storage.cuda())
not specifically related to resnet , but below code should work for majority of issues related to " dont know how to restore data location of torch.storage._untype storage ( tagged with gpu)
import torch
import whisper
devices = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = whisper.load_model("medium" , device =devices)

How to use multiple GPUs in pytorch?

I use this command to use a GPU.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
But, I want to use two GPUs in jupyter, like this:
device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu")
Assuming that you want to distribute the data across the available GPUs (If you have batch size of 16, and 2 GPUs, you might be looking providing the 8 samples to each of the GPUs), and not really spread out the parts of models across difference GPU's. This can be done as follows:
If you want to use all the available GPUs:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CreateModel()
model= nn.DataParallel(model)
model.to(device)
If you want to use specific GPUs:
(For example, using 2 out of 4 GPUs)
device = torch.device("cuda:1,3" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.
model = CreateModel()
model= nn.DataParallel(model,device_ids = [1, 3])
model.to(device)
To use the specific GPU's by setting OS environment variable:
Before executing the program, set CUDA_VISIBLE_DEVICES variable as follows:
export CUDA_VISIBLE_DEVICES=1,3 (Assuming you want to select 2nd and 4th GPU)
Then, within program, you can just use DataParallel() as though you want to use all the GPUs. (similar to 1st case). Here the GPUs available for the program is restricted by the OS environment variable.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CreateModel()
model= nn.DataParallel(model)
model.to(device)
In all of these cases, the data has to be mapped to the device.
If X and y are the data:
X.to(device)
y.to(device)
Using multi-GPUs is as simply as wrapping a model in DataParallel and increasing the batch size. Check these two tutorials for a quick start:
Multi-GPU Examples
Data Parallelism
Another option would be to use some helper libraries for PyTorch:
PyTorch Ignite library Distributed GPU training
In there there is a concept of context manager for distributed configuration on:
nccl - torch native distributed configuration on multiple GPUs
xla-tpu - TPUs distributed configuration
PyTorch Lightning Multi-GPU training
This is of possible the best option IMHO to train on CPU/GPU/TPU without changing your original PyTorch code.
Worth cheking Catalyst for similar distributed GPU options.
In 2022, PyTorch says:
It is recommended to use DistributedDataParallel, instead of this class, to do multi-GPU training, even if there is only a single node. See: Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel and Distributed Data Parallel.
in https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel
Thus, seems that we should use DistributedDataParallel, not DataParallel.
When I ran naiveinception_googlenet, the above methods didn't work for me. The following method solved my problem.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,3" # specify which GPU(s) to be used
If you want to run your code only on specific GPUs (e.g. only on GPU id 2 and 3), then you can specify that using the CUDA_VISIBLE_DEVICES=2,3 variable when triggering the python code from terminal.
CUDA_VISIBLE_DEVICES=2,3 python lstm_demo_example.py --epochs=30 --lr=0.001
and inside the code, leave it as:
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
model = LSTMModel()
model = nn.DataParallel(model)
model = model.to(device)
Source : https://glassboxmedicine.com/2020/03/04/multi-gpu-training-in-pytorch-data-and-model-parallelism/

Using CUDA with pytorch?

Is there a way to reliably enable CUDA on the whole model?
I want to run the training on my GPU. I found on some forums that I need to apply .cuda() on anything I want to use CUDA with (I've applied it to everything I could without making the program crash). Surprisingly, this makes the training even slower.
Then, I found that you could use this torch.set_default_tensor_type('torch.cuda.FloatTensor') to use CUDA. With both enabled, nothing changes. What is happening?
You can use the tensor.to(device) command to move a tensor to a device.
The .to() command is also used to move a whole model to a device, like in the post you linked to.
Another possibility is to set the device of a tensor during creation using the device= keyword argument, like in t = torch.tensor(some_list, device=device)
To set the device dynamically in your code, you can use
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
to set cuda as your device if possible.
There are various code examples on PyTorch Tutorials and in the documentation linked above that could help you.
With both enabled, nothing changes.
That is because you have already set every tensor to GPU.
Is there a way to reliably enable CUDA on the whole model?
model.to('cuda')
I've applied it to everything I could
You only need to apply it to tensors the model will be interacting with, generally:
the model's pramaters model.to('cuda')
the features data features = features.to('cuda')
the target data targets = targets.to('cuda')

Use multiple GPUs for inception_v3 model in TF slim

I am trying to train a slim model using 3 GPUs.
I specifically telling TF to use the second GPU to allocate the model:
with tf.device('device:GPU:1'):
logits, end_points = inception_v3(inputs)
However, I'm getting an OOM error on that GPU everytime I run my code. I've tried to reduce the batch_size so the model fits in memory, but the net is ruinned.
I own 3 GPUS so, is there a way to tell TF to use my third GPU when second is full? I've tried not telling TF to use any GPU and allowing soft placemente, but it is not working either.
This statement with tf.device('device:GPU:1') tells tensorflow specifically to use GPU-1, so it won't attempt to use any other device you have.
When the model is too big, the recommended way is to use model parallelism via manually splitting your graph into different GPUs. The complication in your case is that the model definition is in the library, so you can't insert tf.device statements for different layers unless you patch tensorflow.
But there is a workaround
You can define and place variables before invoking inception_v3 builder. This way inception_v3 will reuse these variables and not change its placement. Example:
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
with tf.device('device:GPU:1'):
tf.get_variable("InceptionV3/Logits/Conv2d_1c_1x1/biases", shape=[1000])
tf.get_variable("InceptionV3/Logits/Conv2d_1c_1x1/weights", shape=[1, 1, 2048, 1000])
with tf.device('device:GPU:0'):
logits, end_points = inception_v3(inputs)
Upon running, you'll see that all variables except Conv2d_1c_1x1 are placed onto GPU-0, while Conv2d_1c_1x1 layer is on GPU-1.
The drawback is that you need to know the shape of each variable you want to replace. But it is doable and at least can get your model running.

Categories

Resources