I downloaded dgllife and run the pubchem_aromaticity example. (https://github.com/chaoyue729/dgl-lifesci/tree/master/examples/property_prediction/pubchem_aromaticity).
But it's always report error. When I change the args['device']="cpu" it can run. But it's too slow. I need run it on cuda. How can I fix it?
def main(args):
args['device'] = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
#args['device'] = torch.device("cpu")
......
dgl._ffi.base.DGLError: Cannot assign node feature "hv" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.
I guess the reason for the error is main.py's bg in line 46, whose type is "dgl.heterograph.DGLHeteroGraph", cannot be copied to CUDA. Reference (https://docs.dgl.ai/guide_cn/graph-gpu.html?highlight=dglerror). But I don't know how to set it.
I have solved this problem. The solution is add some code on the function regress by "main.py".
def regress(args, model, bg):
atom_feats, bond_feats = bg.ndata.pop('hv'), bg.edata.pop('he')
atom_feats, bond_feats = atom_feats.to(args['device']), bond_feats.to(args['device'])
bg = bg.to(args['device'])
return model(bg, atom_feats, bond_feats)
Related
import torch
import torch.nn as nn
import os
class Net(nn.Module):
def __init__(self):
super().__init__()
self.h = -1
def forward(self, x):
self.h =x
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
if torch.cuda.is_available():
print('using Cuda devices, num:', torch.cuda.device_count())
model = nn.DataParallel(Net())
x = 2
print(model.module.h)
model(x)
print(model.module.h)
When I use multiple GPUs to train my model, I find that the Net's params can't be updated correctly, it remains the initial value. However, when I use only one GPU instead, it's can be correctly updated. How can I fix this problem? thx! (The examples are posted in the image)
This is when I using two GPUs, the param 'h' didn't change:
This is when I using only one GPU, the param 'h' had changed:
From the PyTorch's documentation (https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html):
In each forward, module is replicated on each device, so any updates to the running module in forward will be lost. For example, if module has a counter attribute that is incremented in each forward, it will always stay at the initial value because the update is done on the replicas which are destroyed after forward.
I am guessing PyTorch skips the copying part when there is only one GPU.
Also, your h is just an attribute. It is not a "parameter" in PyTorch.
As mentioned in How to tell PyTorch to not use the GPU?, in order to tell PyTorch not to use the GPU you should change a few lines inside PyTorch code.
Where should I make the change?
Where is the line of code that needs to be modified?
I tried to find it but couldn't...
Using the method .cpu() on any tensor or pytorch module transfers the component to the cpu so that the calculations will be made using it.
Another direction is to use the method .to("cpu"). Alternatively, you can change "cpu" with the name of other devices such as "cuda".
Example:
a)
model = MyModel().cpu() # move the model to the cpu
x = data.cpu() # move the input to the cpu
y = model(x)
b)
model = MyModel().to('cpu') # move the model to the cpu
x = data.to('cpu') # move the input to the cpu
y = model(x)
I saved a checkpoint while training on gpu. After reloading the checkpoint and continue training I get the following error:
Traceback (most recent call last):
File "main.py", line 140, in <module>
train(model,optimizer,train_loader,val_loader,criteria=args.criterion,epoch=epoch,batch=batch)
File "main.py", line 71, in train
optimizer.step()
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py", line 106, in step
buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
My training code is as follows:
def train(model,optimizer,train_loader,val_loader,criteria,epoch=0,batch=0):
batch_count = batch
if criteria == 'l1':
criterion = L1_imp_Loss()
elif criteria == 'l2':
criterion = L2_imp_Loss()
if args.gpu and torch.cuda.is_available():
model.cuda()
criterion = criterion.cuda()
print(f'{datetime.datetime.now().time().replace(microsecond=0)} Starting to train..')
while epoch <= args.epochs-1:
print(f'********{datetime.datetime.now().time().replace(microsecond=0)} Epoch#: {epoch+1} / {args.epochs}')
model.train()
interval_loss, total_loss= 0,0
for i , (input,target) in enumerate(train_loader):
batch_count += 1
if args.gpu and torch.cuda.is_available():
input, target = input.cuda(), target.cuda()
input, target = input.float(), target.float()
pred = model(input)
loss = criterion(pred,target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
....
The saving process happened after finishing each epoch.
torch.save({'epoch': epoch,'batch':batch_count,'model_state_dict': model.state_dict(),'optimizer_state_dict':
optimizer.state_dict(),'loss': total_loss/len(train_loader),'train_set':args.train_set,'val_set':args.val_set,'args':args}, f'{args.weights_dir}/FastDepth_Final.pth')
I can't figure why I get this error.
args.gpu == True, and I'm passing the model, all data, and loss function to cuda, somehow there is still a tensor on cpu, could anyone figure out what's wrong?
Thanks.
There might be an issue with the device parameters are on:
If you need to move a model to GPU via .cuda() , please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.
Make sure to add .to(device) to both the model and the model inputs.
For me it worked adding
model.to('cuda')
right after setting my model up:
class Agent:
def __init__(self):
self.n_game = 0
self.epsilon = 0 # Randomness
self.gamma = 0.9 # discount rate
self.memory = deque(maxlen=MAX_MEMORY) # popleft()
self.model = Linear_QNet(11,256,3) # here
self.model.to('cuda') # and here
self.trainer = QTrainer(self.model,lr=LR,gamma=self.gamma)
adding two lines below resolved the issue for me on colab.
(add in both saving and loading)
device = torch.device("cuda")
model.cuda()
note: if you are using google colab obviously you should set your colab runtime to GPU
I'm going through the Fast AI 2022 course and trying to use my M1 Max. I've found that at least with some of the Fastbook code, I could set default_device(torch.device("mps")) and it would resolve my problems.
Here is a reusable snippet that I put at the top of the Jupyter Notebooks I've been dabbling in:
# Check that MPS is available
if not torch.backends.mps.is_available():
if not torch.backends.mps.is_built():
print("MPS not available because the current PyTorch install was not "
"built with MPS enabled.")
else:
print("MPS not available because the current MacOS version is not 12.3+ "
"and/or you do not have an MPS-enabled device on this machine.")
else:
print("MPS is available. Setting as default device.")
mps_device = torch.device("mps")
default_device(mps_device)
I added below code at the start of the file. It solved my issue
os.environ['CUDA_VISIBLE_DEVICES'] ='0'
this answer of Shirley Ow helped me
Make sure to add .to(device) to both the model and the model inputs.
img = torch.from_numpy(img).to(device) # Code in yolov7
I think after you load the model, it is no longer on GPU, try:
model = AutoModelForSequenceClassification.from_pretrained(output_dir).to(device)
I encounter a RunTimeError while I am trying to run the code in my machine's CPU instead of GPU. The code is originally from this GitHub project - IBD: Interpretable Basis Decomposition for Visual Explanation. This is for a research project. I tried putting the CUDA as false and looked at other solutions on this website.
GPU = False # running on GPU is highly suggested
CLEAN = False # set to "True" if you want to clean the temporary large files after generating result
APP = "classification" # Do not change! mode choide: "classification", "imagecap", "vqa". Currently "imagecap" and "vqa" are not supported.
CATAGORIES = ["object", "part"] # Do not change! concept categories that are chosen to detect: "object", "part", "scene", "material", "texture", "color"
CAM_THRESHOLD = 0.5 # the threshold used for CAM visualization
FONT_PATH = "components/font.ttc" # font file path
FONT_SIZE = 26 # font size
SEG_RESOLUTION = 7 # the resolution of cam map
BASIS_NUM = 7 # In decomposition, this is to decide how many concepts are used to interpret the weight vector of a class.
Here is the error:
Traceback (most recent call last):
File "test.py", line 22, in <module>
model = loadmodel()
File "/home/joshuayun/Desktop/IBD/loader/model_loader.py", line 48, in loadmodel
checkpoint = torch.load(settings.MODEL_FILE)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 574, in _load
result = unpickler.load()
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 537, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 119, in default_restore_location
result = fn(storage, location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 95, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 79, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but
torch.cuda.is_available() is False. If you are running on a CPU-only machine,
please use torch.load with map_location='cpu' to map your storages to the CPU.
If you don't have gpu then use map_location=torch.device('cpu') with load model.load()
my_model = net.load_state_dict(torch.load('classifier.pt', map_location=torch.device('cpu')))
Just giving a smaller answer. To solve this, you could change the parameters of the function named load() in the serialization.py file. This is stored in: ./site-package/torch/serialization.py
Write:
def load(f, map_location='cpu', pickle_module=pickle, **pickle_load_args):
instead of:
def load(f, map_location=None, pickle_module=pickle, **pickle_load_args):
Hope it helps.
"If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU."
model = torch.load('model/pytorch_resnet50.pth',map_location ='cpu')
I have tried add "map_location='cpu'" in load function, but it doesn't work for me.
If you use a model trained by GPU on a CPU only computer, then you may meet this bug. And you can try this solution.
solution
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else: return super().find_class(module, name)
contents = CPU_Unpickler(f).load()
You can remap the Tensor location at load time using the map_location argument to torch.load.
On the following repository,in file "test.py", model = loadmodel() calls the model_loader.py file to load the model with torch.load().
While this will only map storages from GPU0, add the map_location:
torch.load(settings.MODEL_FILE, map_location={'cuda:0': 'cpu'})
In the model_loader.py file, add, map_location={'cuda:0': 'cpu'} whereever, torch.load() function is called.
As you state the problem hints you are trying to use a cuda-model on non-cuda machine. Pay attention to the details of the error message - please use torch.load with map_location='cpu' to map your storages to the CPU. I've had similar problem when I tried to load (from a checkpoint) pre-trained model on my cpu-only machine. The model was trained on a cuda machine so it couldn't be properly loaded. Once I added the map_location='cpu' argument to the load method everything worked.
I faced the same problem, Instead of modifying the existing code, which was running good yesterday, First I checked whether my GPU is free or not running
nvidia-smi
I could see that, its under utilized, therefore as traditional solution, I shutdown the laptop and restarted it and it got working.
(One thing I kept in mind that, earlier it was working and I haven't changed anything in code therefore it should work once I restart it and it got working and I was able to use the GPU)
For some reason, this also happens with portainer, even though your machines have GPUs. A crude solution would be to just restart it. It usually happens if you fiddle with the state of the container after it has been deployed (e.g. you change the restart policies while the container is running), which makes me think it's some portainer issue.
nothing worked for me-
my pickle was a custom object- in a script file with the line
device = torch.device("cuda")
finally, I managed to take Spikes solution, and adapt it to my needs with simple open(path,"rb"), so for any other unfortunate developers:
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else: return super().find_class(module, name)
contents = CPU_Unpickler(open(path,"rb")).load()
There is much easier way. Just add map_location to torch.load(path, map_location='cpu') as map_location='cpu':
def load_checkpoint(path) -> 'LanguageModel':
checkpoint = torch.load(path, map_location='cpu')
model = LanguageModel(
number_of_tokens=checkpoint['number_of_tokens'],
max_sequence_length=checkpoint['max_sequence_length'],
embedding_dimension=checkpoint['embedding_dimension'],
number_of_layers=checkpoint['number_of_layers'],
number_of_heads=checkpoint['number_of_heads'],
feed_forward_dimension=checkpoint['feed_forward_dimension'],
dropout_rate=checkpoint['dropout_rate']
).to(get_device())
model.load_state_dict(checkpoint['model_state_dict'])
return model.to(get_device())
I'm using to following setup:
Fedora 26, NVIDIA GTX 970, CUDA 8.0, CUDNN 6.0 and tensorflow-gpu==1.3.0 on python
My problem is that, when forcing a dynamic_rnn operator to run on my gpu using:
with tf.name_scope('encoder_both_rnn'),tf.device('/gpu:0'):
_, encoder_state_final_forward = tf.nn.dynamic_rnn(self.encoder_cell_forward,input_ph,dtype=tf.float32,time_major=False,sequence_length=sequence_length,scope='encoder_rnn_forward')
_, encoder_state_final_reverse = tf.nn.dynamic_rnn(self.encoder_cell_reverse,input_reverse,dtype=tf.float32,time_major=False,sequence_length=sequence_length,scope='encoder_rnn_reverse')
i receive the following error when calling the global variable initializer:
InvalidArgumentError: Node 'init/NoOp': Unknown input node '^drawlog_vae_test.DrawlogVaeTest.queue_training/encoder/encoder/encoder_W_mean/Variable/Assign'
The variable is created using the following statement:
self.encoder_W_mean = u.weight_variable([self.intermediate_state_size * 2,self.intermediate_state_size*2],name='encoder_W_mean')
with
def weight_variable(shape,name=None,use_lambda_init=False):
with tf.name_scope(name):
num_weights = float(reduce(lambda x,y: x*y,shape))
initial = tf.truncated_normal(shape,stddev=1) * math.sqrt(2.0/num_weights)
if use_lambda_init:
initial = lambda: np.random.normal(size=shape)
return tf.Variable(initial,dtype=tf.float32)
The strange thing about this is, the variable has nearly nothing to do with the both rnns. Is there any chance running my rnn on the GPU? Or is this just a strange error to tell me I can't run a rnn on a GPU?