Tensorflow tf.GradientTape() should only use Tf.variables? - python

I'm trying to write a reinforcement learning agent using tensorflow. I'm wondering if the states should be tf.Variables or can be numpy arrays for backpropogation using gradient tape. I'm not sure if the gradients will be correct if my states/action arrays are numpy instead of tensorflow arrays, I do know that the loss function returns a tf.Variable however. Thanks, I'm still a beginner to using Tensorflow any explanation/suggestions would help alot.
In a very simplified form (not word for word), my code looks something like:
with tf.GradientTape as tape:
#actions/states are both lists of np arrays
action = model.call(state)
states.append(state)
actions.append(actions)
loss = model.loss(states,actions) #loss returns tf.variable
model.optimizer.apply_gradients(tape.gradient(loss, model.variables)

Hi Noob :) The optimizer.apply_gradients operation will update only model tf.Variables having non-zero gradients (see input argument model.variables).
Reference: https://www.tensorflow.org/api_docs/python/tf/GradientTape
Trainable variables (created by tf.Variable or
tf.compat.v1.get_variable, where trainable=True is default in both
cases) are automatically watched. Tensors can be manually watched by
invoking the watch method on this context manager.
Edit: if you want to call the model to make a predictions given a numpy array: this is sort of possible. According to the documentation the input of model.call() should be a tensor object. You can simply get a tensor from your numpy array as:
state # numpy array
tf_state = tf.constant(state)
model.call(tf_state)
Of course, instead of creating new tf.constants for each iteration of the training loop, you can first initialize a (non-trainable) tf.Variables, and then just update its values with those of the numpy array! Something like the following should work:
tf_state = tf.Variable(np.zeros_like(state), dtype=tf.float32, trainable=False)
for iter in n_train_iterations:
state = get_new_numpy_state()
tf_state.assign(state)
model.call(tf_state)

Related

Output of a layer with weight argument in TensorFlow

Is there a way in TensorFlow to compute the output of a layer while specifying the weights, something like y = layer(x, weights=w)?
The final purpose is to compute the gradient of some function of the weights, $w \mapsto layer(x, weights = f(w))$, however automatic differentiation does not seem to work with layer.set_weights.
To update variables you must you their .assign function. See https://www.tensorflow.org/api_docs/python/tf/Variable for more details. You can also most definitely pass weights to a layer. You would need to create custom later by subclassing tf.keras.layers.Layer. See https://www.tensorflow.org/tutorials/customization/custom_layers for more details.

Accessing training data during tensorflow graph execution

I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.

What's the purpose of torch.autograd.Variable?

I load features and labels from my training dataset. Both of them are originally numpy arrays, but I change them to the torch tensor using torch.from _numpy(features.copy()) and torch.tensor(labels.astype(np.bool)).
And I notice that torch.autograd.Variable is something like placeholder in tensorflow.
When I train my network, first I tried
features = features.cuda()
labels = labels.cuda()
outputs = Config.MODEL(features)
loss = Config.LOSS(outputs, labels)
Then I tried
features = features.cuda()
labels = labels.cuda()
input_var = Variable(features)
target_var = Variable(labels)
outputs = Config.MODEL(input_var)
loss = Config.LOSS(outputs, target_var)
Both blocks succeed in activating training, but I worried that there might be trivial difference.
According to this question you no longer need variables to use Pytorch Autograd.
Thanks to #skytree, we can make this even more explizit: Variables have been deprecated, i.e. you're not supposed to use them anymore.
Autograd automatically supports Tensors with requires_grad set to True.
And more importantly
Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.
This means that if your features and labels are tensors already (which they seem to be in your example) your Variable(features) and Variable(labels) does only return a tensor again.
The original purpose of Variables was to be able to use automatic differentiation (Source):
Variables are just wrappers for the tensors so you can now easily auto compute the gradients.

InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,1339,2560,1]

I'm a rookie to tensorflow, and I get the following error when using keras:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,1339,2560,1]
because in my model:
model = Model(input=inputs, output=conv13).
The input size is [?,1339,2560,1] and the output size is [?,1328,2560,1] after cropping, so I want to use pad in numpy to make up for the difference:
sess=tf.Session()
sess.run(tf.initialize_all_variables())
conv12_ar = conv12.eval(session=sess)
conv13_tem = np.pad(conv12_ar, ((0, 0),(5, 6), (0, 0), (0, 0)), 'edge')
conv13 = tf.convert_to_tensor(conv13_tem)
and I get the error above, can anyone help me? or indicate another way to make up the difference
I suspect that you are trying to treat tensorflow as a procedural language and not declarative. It's almost always a bug if you have tensorflow statements after you've created your session, and among the most common errors. You haven't posted quite enough of your code to make out exactly where the issue is though.
It appears that you are using numpy to do the padding, so the result of the padding operation should be passed into a sess.run call, and you should not need a session to do anything here.
Also, you can simply add the padding step to the tensorflow graph, so take the input in shape [?, 1328, 2560, 1], then right after the definition of the tf.placeholder use tf.pad to pad it. Then make sure the placeholder is expecting the smaller form and everything should work.
I always recommend creating a build_graph() function where you put all your tensor and OP definitions. You should never create any tensorflow constructs after opening the session. Think of tensorflow as having 2 phases: (1) you build a graph of the operations you want to use, then (2) you pass data into the graph (the placeholders) and ask for various values to be computed (sess.run).

Tensorflow, py_func, or custom function

I'm currently working on a quaternionic Neural Network using Tensorflow (I want to use GPUs). TensorFlow doesn't have support for quaternions, but you can represent than as a 4x4 real matrix, so it might be possible to build such a neural network in TensorFlow.
Is there a simple way to add a custom operation or to do a custom operation on tensors?
For example, I can write:
output_activation = tf.nn.softmax(tf.matmul(hidden_activation, Weight_to_ouput))
...and that's pretty cool! All you have to do is add a loss function and then do backpropagation. However, I want to do the same thing but with quaternions, for example:
output_activation = mySigmoid(myFunction(hidden_activation, Weight_to_output))
However, I need to transform the quaternions to and from tensors to optimize the GPU calculation. So I need to create a function that gets some tensors as parameters and returns the transformed tensors.
I've looked at py_func, but it seems that you can't return tensors.
I tried the following, but it failed:
def layerActivation(inputTensor,WeightTensor):
newTensor = tf.matmul(inputTensor,WeightTensor)
return newTensor
...and in main():
x = placeholder ...
W_to_hidden = tf.Variable
test = tf.py_func(layerActivation, [x,_W_to_hidden], [tf.float32])
with tf.Session() as sess:
tf.initialize_all_variables().run()
king_return = sess.run(test, feed_dict={x: qtrain})
Error : Unimplemented: Unsupported object type Tensor
Ideally I could use this output_activation in the standard backprop algorithm of TensorFlow but I don't know if it's possible.
Depending on the functionality required, you might be able to implement your operation as a composition of existing TensorFlow ops, without needing to use tf.py_func().
For example, the following works and will run on a GPU:
def layer_activation(input_tensor, weight_tensor):
return tf.matmul(input_tensor, weight_tensor)
# ...
x = tf.placeholder(...)
W_to_hidden = tf.Variable(...)
test = layer_activation(input_tensor, weight_tensor)
# ...
The main reason to use tf.py_func() is if your operations cannot be implemented using TensorFlow operations, and you want to inject some Python code (e.g. using NumPy) that works on the actual values of your tensor.
However, if your mySigmoid() or myFunction() operations cannot be implemented in terms of existing TensorFlow operations, and you want to implement them on GPU, then—as keveman says—you will need to add a new op.
If you want to run your custom operations on GPUs, you have to provide GPU implementation (kernels) in C++. Look at the documentation here for how to extend TensorFlow with custom operations, and especially the section on GPU support.

Categories

Resources