I'm using to following setup:
Fedora 26, NVIDIA GTX 970, CUDA 8.0, CUDNN 6.0 and tensorflow-gpu==1.3.0 on python
My problem is that, when forcing a dynamic_rnn operator to run on my gpu using:
with tf.name_scope('encoder_both_rnn'),tf.device('/gpu:0'):
_, encoder_state_final_forward = tf.nn.dynamic_rnn(self.encoder_cell_forward,input_ph,dtype=tf.float32,time_major=False,sequence_length=sequence_length,scope='encoder_rnn_forward')
_, encoder_state_final_reverse = tf.nn.dynamic_rnn(self.encoder_cell_reverse,input_reverse,dtype=tf.float32,time_major=False,sequence_length=sequence_length,scope='encoder_rnn_reverse')
i receive the following error when calling the global variable initializer:
InvalidArgumentError: Node 'init/NoOp': Unknown input node '^drawlog_vae_test.DrawlogVaeTest.queue_training/encoder/encoder/encoder_W_mean/Variable/Assign'
The variable is created using the following statement:
self.encoder_W_mean = u.weight_variable([self.intermediate_state_size * 2,self.intermediate_state_size*2],name='encoder_W_mean')
with
def weight_variable(shape,name=None,use_lambda_init=False):
with tf.name_scope(name):
num_weights = float(reduce(lambda x,y: x*y,shape))
initial = tf.truncated_normal(shape,stddev=1) * math.sqrt(2.0/num_weights)
if use_lambda_init:
initial = lambda: np.random.normal(size=shape)
return tf.Variable(initial,dtype=tf.float32)
The strange thing about this is, the variable has nearly nothing to do with the both rnns. Is there any chance running my rnn on the GPU? Or is this just a strange error to tell me I can't run a rnn on a GPU?
Related
I am facing a memory leak when iteratively updating tensors in PyTorch on my Mac M1 GPU using the PyTorch mps interface. The following is a minimal reproducible example that replicates the behavior:
import torch
def leak_example(p1, device):
t1 = torch.rand_like(p1, device = device) # torch.cat((torch.diff(ubar.detach(), dim=0).detach().clone(), torch.zeros_like(ubar.detach()[:1,:,:,:], dtype = torch.float32)), dim = 0)
u1 = p1.detach() + 2 * (t1.detach())
B = torch.rand_like(u1, device = device)
mask = u1 < B
a1 = u1.detach().clone()
a1[~mask] = torch.rand_like(a1)[~mask]
return a1
if torch.cuda.is_available(): # cuda gpus
device = torch.device("cuda")
elif torch.backends.mps.is_available(): # mac gpus
device = torch.device("mps")
torch.set_grad_enabled(False)
p1 = torch.rand(5, 5, 224, 224, device = device)
for i in range(10000):
p1 = leak_example(p1, device)
My Mac's GPU memory steadily grows when I execute this loop. I have tried running it on a CUDA GPU in Google Colab and it seems to be behaving similarly, with the GPU's Active memory, Non-releasable memory, and Allocated memory increasing as the loop progresses.
I have tried detaching and cloning the tensors and using weakrefs, to no avail. Interestingly, if I don't reassign the output of leak_example to p1, the behavior disappears, so it really seems related to the recursive assignment. Does anyone have any idea how I could resolve this?
I think I found the cause of the leak, it was the masked assignment. Replacing it with an equivalent torch.where() statement makes the leak disappear. I imagine this is related to masked_scatter not being implemented for MPS support in PyTorch (yet)?
I downloaded dgllife and run the pubchem_aromaticity example. (https://github.com/chaoyue729/dgl-lifesci/tree/master/examples/property_prediction/pubchem_aromaticity).
But it's always report error. When I change the args['device']="cpu" it can run. But it's too slow. I need run it on cuda. How can I fix it?
def main(args):
args['device'] = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
#args['device'] = torch.device("cpu")
......
dgl._ffi.base.DGLError: Cannot assign node feature "hv" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.
I guess the reason for the error is main.py's bg in line 46, whose type is "dgl.heterograph.DGLHeteroGraph", cannot be copied to CUDA. Reference (https://docs.dgl.ai/guide_cn/graph-gpu.html?highlight=dglerror). But I don't know how to set it.
I have solved this problem. The solution is add some code on the function regress by "main.py".
def regress(args, model, bg):
atom_feats, bond_feats = bg.ndata.pop('hv'), bg.edata.pop('he')
atom_feats, bond_feats = atom_feats.to(args['device']), bond_feats.to(args['device'])
bg = bg.to(args['device'])
return model(bg, atom_feats, bond_feats)
In tensorflow 1.X with standalone keras 2.X, I used to switch between training on GPU, and running inference on CPU (much faster for some reason for my RNN models) with the following snippet:
keras.backend.clear_session()
def set_session(gpus: int = 0):
num_cores = cpu_count()
config = tf.ConfigProto(
intra_op_parallelism_threads=num_cores,
inter_op_parallelism_threads=num_cores,
allow_soft_placement=True,
device_count={"CPU": 1, "GPU": gpus},
)
session = tf.Session(config=config)
k.set_session(session)
This ConfigProto functionality is no longer available in tensorflow 2.0 (there I'm using the integrated tensorflow.keras). In the beginning, it is possible to run tf.config.experimental.set_visible_devices() in order to e.g. disable the GPU, but any subsequent calls to set_visible_devices result in RuntimeError: Visible devices cannot be modified after being initialized. Is there a way of re-initializing the visible devices or is there another way of switching the devices available?
You can use tf.device to explicitly set which device you want to use. For example:
import tensorflow as tf
model = tf.keras.Model(...)
# Run training on GPU
with tf.device('/gpu:0'):
model.fit(...)
# Run inference on CPU
with tf.device('/cpu:0'):
model.predict(...)
If you only have one CPU and one GPU, the names used above should work. Otherwise, device_lib.list_local_devices() can give you a list of your devices. This post gives a nice function for listing just the names, which I adapt here to also show CPUs:
from tensorflow.python.client import device_lib
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU' or x.device_type == 'CPU']
Does using tf.device can help you?
With that, you can set some operations either on CPU or on GPU.
I would just restart the kernel, this worked for me
I am using the python API of TensorFlow to train a variant of an LSTM.
For that purpose I use the tf.while_loop function to iterate over the time steps.
When running my script on the cpu, it does not produce any error messages, but on the gpu python crashes due to:
...tensorflow/tensorflow/core/framework/tensor.cc:885] Check failed: nullptr != b.buf_ (nullptr vs. 00...)
The part of my code, that causes this failure (when commenting it out, it works) is in the body of the while loop:
...
h_gathered = h_ta.gather(tf.range(time))
h_gathered = tf.transpose(h_gathered, [1, 0, 2])
syn_t = self.syntactic_weights_ta.read(time)[:, :time]
syn_t = tf.expand_dims(syn_t, 1)
syn_state_t = tf.squeeze(tf.tanh(tf.matmul(syn_t, h_gathered)), 1)
...
where time is zero based and incremented after each step, h_ta is a TensorArray
h_ta = tf.TensorArray(
dtype=dtype,
size=max_seq_len,
clear_after_read=False,
element_shape=[batch_size, num_hidden],
tensor_array_name="fw_output")
and self.syntactic_weights_ta is also a TensorArray
self.syntactic_weights_ta = tf.TensorArray(
dtype=dtype,
size=max_seq_len,
tensor_array_name="fw_syntactic_weights")
self.syntactic_weights_ta = self.syntactic_weights_ta.unstack(syntactic_weights)
What I am trying to achieve in the code snippet is basically a weighted sum over the past outputs, stored in h_ta.
In the end I train the network with tf.train.AdamOptimizer.
I have tested the script again, but this time with swap_memory parameter in the while loop set to False and it works on GPU as well, though I'd really like to know why it does not work with swap_memory=True.
This looks like a bug in the way that TensorArray's tensor storage mechanisms interact with the allocation magic that is performed by while_loop when swap_memory=True.
Can you open an issue on TF's github? Please also include:
A full stack trace (TF built with -c dbg preferrable)
A minimal code example to reproduce
Describe whether the issue requires you to be calling backprop.
Whether this is reproducible in TF 1.2 / nightlies / master branch.
And respond here with the link to the github issue?
If I run the following code with python 3.5
import numpy as np
import time
import theano
A = np.random.rand(1000,10000).astype(theano.config.floatX)
B = np.random.rand(10000,1000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X,Y = theano.tensor.matrices('XY')
mf = theano.function([X,Y],X.dot(Y))
t_start = time.time()
tAB = mf(A,B)
t_end = time.time()
print ("NP time: %f[s], theano time: %f[s] **(times should be close when run
on CPU!)**" %(np_end-np_start, t_end-t_start))
print ("Result difference: %f" % (np.abs(AB-tAB).max(), ))
I get the output
NP time: 0.161123[s], theano time: 0.167119[s] (times should be close when
run on CPU!)
Result difference: 0.000000
it says if the times are close, it means that I am running on my CPU.
How can I run this code on my GPU?
NOTE:
I have a workstation with Nvidia Quadro k4200.
I have installed Cuda toolkit
I have successfully worked an cuda vectorAdd sample project on VS2012.
You configure Theano to use a GPU by specifying the device=gpu in Theano's config. There are two principle methods for setting the config: (1) in the THEANO_FLAGS environment variable, or (2) via the .theanorc file. Both methods, and all of Theano's configuration flags, are documented.
You will know that Theano is using the GPU if, after calling import theano you see a message that looks something like this
Using gpu device 0: GeForce GT 640 (CNMeM is disabled)
The details may vary for you but if no message appears at all then Theano is using the CPU only.
Note also that even if you see the GPU message, your particular computation graph may not run on the GPU. To see which parts of your computation are running on the GPU print its compiled and optimized graph
f = theano.function(...)
theano.printing.debugprint(f)
Operations that start with the prefix 'Gpu' will run on the GPU. Operations that do not have that prefix to their name will run on the CPU.
If you are on Linux, create a .theanorc file in your home folder and add the following to set up theano to run on GPU.
[global]
device = gpu
floatx = float32
Alternatively, if you want to use the GPU programattically:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")
You should see a message like this:
Using gpu device 0: Tesla K80
Useful if the environment you are running in isn't easy to configure.