Convert CudaNdarraySharedVariable to TensorVariable - python

I'm trying to convert a pylearn2 GPU model to a CPU compatible version for prediction on a remote server -- how can I convert CudaNdarraySharedVariable's to TensorVariable's to avoid an error calling cuda code on a GPU-less machine? The experimental theano flag unpickle_gpu_to_cpu seems to have left a few CudaNdarraySharedVariable's hanging around (specifically model.layers[n].transformer._W).

For a plain CudaNdarray variable, something like this should work:
'''x = CudaNdarray... x_new=theano.tensor.TensorVariable(CudaNdarrayType([False] * tensor_dim))
f = theano.function([x_new], x_new)
converted_x = f(x)
'''

Related

Get the number of GPUs used in Tensorflow Distributed in a multi node approach

I am currently trying to compare Horovod and Tensorflow Distributed API.
When using using Horovod, I am able to access the total number of GPUs currently used as follows:
import horovod.tensorflow as hvd
size = hvd.size()
A similar concept is available when using PyTorch Distributed API:
size = int(os.environ["WORLD_SIZE"])
I would like to perform the same operation and obtain the number of GPUs currently in use for multi GPUs/nodes with TF Distributed official API.
I can't use CUDA_VISIBLE_DEVICES environment variable as it would only work on a single node.
A few findings which answer my question:
Equivalent of hvd.size(): (the session must be started and initialized first unlike hvd ! Else you will just get "1")
==> tf.distribute.get_strategy().num_replicas_in_sync
Equivalent of hvd.rank(): (the session must be started and initialized first unlike hvd ! Else you will just get "0")
def get_rank():
replica_id = tf.distribute.get_replica_context().replica_id_in_sync_group
if isinstance(replica_id, tf.Tensor):
return tf.get_static_value(replica_id) != 0)
else:
return 0
Is TF Distributed running ? : tf.distribute.has_strategy() => True/False (same remark as above, else you just get False)

Tensorflow objects not displaying

I just installed TensorFlow using conda install tensorflowfrom the anaconda prompt. I'm using Python 3.6 on Windows 10.
I thought I'd try it out with something simple, like
rnd_ints = tf.random_normal([10], dtype=tf.float64)
When I call rnd_ints all I get is this:
<tf.Tensor 'random_normal:0' shape=(10,) dtype=float64>
I thought I was supposed to be getting an array object of some sort?
From the documentation:
A Tensor is a symbolic handle to one of the outputs of an Operation.
It does not hold the values of that operation's output, but instead
provides a means of computing those values in a TensorFlow tf.Session.
This class has two primary purposes:
A Tensor can be passed as an input to another Operation. This builds a
dataflow connection between operations, which enables TensorFlow to
execute an entire Graph that represents a large, multi-step
computation.
After the graph has been launched in a session, the value of the
Tensor can be computed by passing it to tf.Session.run. t.eval() is a
shortcut for calling tf.get_default_session().run(t).
The answer to your question: when you call tf.random_normal() you create a Tensor object, which does not have actual values stored. In order to get an output you'll need to run it inside a session. Here's how you can get an actual output:
import tensorflow as tf
rnd_ints = tf.random_normal([10], dtype=tf.float64)
with tf.Session() as sess:
rnd = sess.run(rnd_ints)
print(rnd)
# [-1.59628093 0.62648824 0.18566968 0.2274149 1.27171951 -0.18103614
# -2.05964716 0.37477217 0.3355942 -1.57350681]

TensorFlow nullptr check failed on GPU

I am using the python API of TensorFlow to train a variant of an LSTM.
For that purpose I use the tf.while_loop function to iterate over the time steps.
When running my script on the cpu, it does not produce any error messages, but on the gpu python crashes due to:
...tensorflow/tensorflow/core/framework/tensor.cc:885] Check failed: nullptr != b.buf_ (nullptr vs. 00...)
The part of my code, that causes this failure (when commenting it out, it works) is in the body of the while loop:
...
h_gathered = h_ta.gather(tf.range(time))
h_gathered = tf.transpose(h_gathered, [1, 0, 2])
syn_t = self.syntactic_weights_ta.read(time)[:, :time]
syn_t = tf.expand_dims(syn_t, 1)
syn_state_t = tf.squeeze(tf.tanh(tf.matmul(syn_t, h_gathered)), 1)
...
where time is zero based and incremented after each step, h_ta is a TensorArray
h_ta = tf.TensorArray(
dtype=dtype,
size=max_seq_len,
clear_after_read=False,
element_shape=[batch_size, num_hidden],
tensor_array_name="fw_output")
and self.syntactic_weights_ta is also a TensorArray
self.syntactic_weights_ta = tf.TensorArray(
dtype=dtype,
size=max_seq_len,
tensor_array_name="fw_syntactic_weights")
self.syntactic_weights_ta = self.syntactic_weights_ta.unstack(syntactic_weights)
What I am trying to achieve in the code snippet is basically a weighted sum over the past outputs, stored in h_ta.
In the end I train the network with tf.train.AdamOptimizer.
I have tested the script again, but this time with swap_memory parameter in the while loop set to False and it works on GPU as well, though I'd really like to know why it does not work with swap_memory=True.
This looks like a bug in the way that TensorArray's tensor storage mechanisms interact with the allocation magic that is performed by while_loop when swap_memory=True.
Can you open an issue on TF's github? Please also include:
A full stack trace (TF built with -c dbg preferrable)
A minimal code example to reproduce
Describe whether the issue requires you to be calling backprop.
Whether this is reproducible in TF 1.2 / nightlies / master branch.
And respond here with the link to the github issue?

What is the alternative of tf.Variable.ref() in Tensorflow version 0.12?

I'm trying to run open code of A3C reinforcement learning algorithm to learn A3C in A3C code
However,I got several errors and I could fix except one.
In the code, ref() which is a member function of tf.Variable is used (1,2), but in recent tensorflow version 0.12rc, that function seems to be deprecated.
So I don't know what is the best way to replace it (I don't understand exactly why the author used ref()). When I just changed it to the variable itself (for example v.ref() to v), there was no error, but reward is not changed. It seems it cannot learn and I guess it is because the variables are not properly updated.
Please advise me what is the proper way to modify the code to work.
The new method tf.Variable.read_value() is the replacement for tf.Variable.ref() in TensorFlow 0.12 and later.
The use case for this method is slightly tricky to explain, and is motivated by some caching behavior that causes multiple uses of a remote variable on a different device to use a cached value. Let's say you have the following code:
with tf.device("/cpu:0")
v = tf.Variable([[1.]])
with tf.device("/gpu:0")
# The value of `v` will be captured at this point and cached until `m2`
# is computed.
m1 = tf.matmul(v, ...)
with tf.control_dependencies([m1])
# The assign happens (on the GPU) after `m1`, but before `m2` is computed.
assign_op = v.assign([[2.]])
with tf.control_dependencies([assign_op]):
with tf.device("/gpu:0"):
# The initially read value of `v` (i.e. [[1.]]) will be used here,
# even though `m2` is computed after the assign.
m2 = tf.matmul(v, ...)
sess.run(m2)
You can use tf.Variable.read_value() to force TensorFlow to read the variable again later, and it will be subject to whatever control dependencies are in place. So if you wanted to see the result of the assign when computing m2, you'd modify the last block of the program as follows:
with tf.control_dependencies([assign_op]):
with tf.device("/gpu:0"):
# The `read_value()` call will cause TensorFlow to transfer the
# new value of `v` from the CPU to the GPU before computing `m2`.
m2 = tf.matmul(v.read_value(), ...)
(Note that, currently, if all of the ops were on the same device, you wouldn't need to use read_value(), because TensorFlow doesn't make a copy of the variable when it is used as the input to an op on the same device. This can cause a lot of confusion—for example when you enqueue a variable to a queue!—and it's one of the reasons that we're working on enhancing the memory model for variables.)

Theano matrix multiplication

I have a piece of code that is supposed to calculate a simple
matrix product, in python (using theano). The matrix that I intend to multiply with is a shared variable.
The example is the smallest example that demonstrates my problem.
I have made use of two helper-functions. floatX converts its input to something of type theano.config.floatX
init_weights generates a random matrix (in type floatX), of given dimensions.
The last line causes the code to crash. In fact, this forces so much output on the commandline that I can't even scroll to the top of it anymore.
So, can anyone tell me what I'm doing wrong?
def floatX(x):
return numpy.asarray(x,dtype=theano.config.floatX)
def init_weights(shape):
return floatX(numpy.random.randn(*shape))
a = init_weights([3,3])
b = theano.shared(value=a,name="b")
x = T.matrix()
y = T.dot(x,b)
f = theano.function([x],y)
This work for me. So my guess is that you have a problem with your blas installation. Make sure to use Theano development version:
http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
It have better default for some configuration. If that do not fix the problem, look at the error message. There is main part that is after the code dump. After the stack trace. This is what is the most useful normally.
You can disable direct linking by Theano to blas with this Theano flag: blas.ldflags=
This can cause slowdown. But it is a quick check to confirm the problem is blas.
If you want more help, dump the error message to a text file and put it on the web and link to it from here.

Categories

Resources