Error while clipping gradient - python

I was following this tensorflow tutorial for gradient clipping while working with a multilayer perceptron.
grads_and_vars = optimizer.compute_gradients(cross_entropy_loss, trainable_variable)
capped_grads_and_vars = [(tf.clip_by_global_norm(gv[0],5), gv[1]) for gv in grads_and_vars]
optimizer.apply_gradients(capped_grads_and_vars)
tensorflow shows the following error,
in clip_by_global_norm raise TypeError("t_list should be a sequence")
trainable_variable is a list which I created while creating the model. assume I have a trainable variable(tf.Variable), I add this variable to trainable_variable list by the following command.
trainable_variable.append(var) #where ver is a trainable variable in tensorflow

The key point of this type of problem is, trainable_variable list may contain multiple tensors who are not initialized or used in the graph. make sure you contain all the tensor safely in the trainable_variable list. Sometimes even they might contain NaN for gradient computation. This type of error may also introduce for unnatural value.

Related

Tensorflow tf.GradientTape() should only use Tf.variables?

I'm trying to write a reinforcement learning agent using tensorflow. I'm wondering if the states should be tf.Variables or can be numpy arrays for backpropogation using gradient tape. I'm not sure if the gradients will be correct if my states/action arrays are numpy instead of tensorflow arrays, I do know that the loss function returns a tf.Variable however. Thanks, I'm still a beginner to using Tensorflow any explanation/suggestions would help alot.
In a very simplified form (not word for word), my code looks something like:
with tf.GradientTape as tape:
#actions/states are both lists of np arrays
action = model.call(state)
states.append(state)
actions.append(actions)
loss = model.loss(states,actions) #loss returns tf.variable
model.optimizer.apply_gradients(tape.gradient(loss, model.variables)
Hi Noob :) The optimizer.apply_gradients operation will update only model tf.Variables having non-zero gradients (see input argument model.variables).
Reference: https://www.tensorflow.org/api_docs/python/tf/GradientTape
Trainable variables (created by tf.Variable or
tf.compat.v1.get_variable, where trainable=True is default in both
cases) are automatically watched. Tensors can be manually watched by
invoking the watch method on this context manager.
Edit: if you want to call the model to make a predictions given a numpy array: this is sort of possible. According to the documentation the input of model.call() should be a tensor object. You can simply get a tensor from your numpy array as:
state # numpy array
tf_state = tf.constant(state)
model.call(tf_state)
Of course, instead of creating new tf.constants for each iteration of the training loop, you can first initialize a (non-trainable) tf.Variables, and then just update its values with those of the numpy array! Something like the following should work:
tf_state = tf.Variable(np.zeros_like(state), dtype=tf.float32, trainable=False)
for iter in n_train_iterations:
state = get_new_numpy_state()
tf_state.assign(state)
model.call(tf_state)

Tensorflow & Keras: Different output shape depends on training or inferring

I am sub-classing tensorflow.keras.Model to implement a certain model. Expected behavior:
Training (fitting) time: returns a list of tensors including the final output and auxiliary output;
Inferring (predicting) time: returns a single output tensor.
And the code is:
class SomeModel(tensorflow.keras.Model):
# ......
def call(self, x, training=True):
# ......
return [aux1, aux2, net] if training else net
This is how i use it:
model=SomeModel(...)
model.compile(...,
loss=keras.losses.SparseCategoricalCrossentropy(),
loss_weights=[0.4, 0.4, 1],...)
# ......
model.fit(data, [labels, labels, labels])
And got:
AssertionError: in converted code:
ipython-input-33-862e679ab098:140 call *
`return [aux1, aux2, net] if training else net`
...\tensorflow_core\python\autograph\operators\control_flow.py:918 if_stmt
Then the problem is that the if statement is converted into the calculation graph and this would of course cause the problem. I found the whole stack trace is long and useless so it's not included here.
So, is there any way to make TensorFlow generate different graph based on training or not?
Which tensorflow version are you using? You can overwrite behaviour in the .fit, .predict and .evaluate methods in Tensorflow 2.2, which would generate different graphs for these methods (I assume) and potentially work for your use-case.
The problems with earlier versions is that subclassed models get created by tracing the call method. This means Python conditionals become Tensorflow conditionals and face several limitations during graph creation and execution.
First, both branches (if-else) have to be defined, and regarding python collections (eg. lists), the branches have to have the same structure (eg. number of elements). You can read about the limitations and effects of Autograph here and here.
(Also, a conditional may not get evaluated at every run, if the condition is based on a Python variable and not a tensor.)

Keras Concatenate Layers: Difference between different types of concatenate functions

I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.

Return layer activations and weights from Tensorflow model in separate threads

I am attempting to retrieve an output from a tensorflow model (global mModel) loaded in one thread (using keras.models.model_from_json and load_weights) and run (using predict) in another on a webserver. How can I also provide outputs from hidden layers and the network weights?
In some attempts of predicting on models of intermediate layers by creating models as below I have gotten an error including "tensor is not an element of this graph".
for modelLayer in mModel.layers:
if not modelLayer.output == mModel.input:
intermediateModel = keras.models.Model(inputs=mModel.input, outputs=modelLayer.output)
layerActivations = intermediateModel.predict(np.array([inputs]))[0]
When attempting to get the weights using a session generated in the origin thread (mSess)
mModel.layers[1].weights[0].eval(session=mSess)
I get the error:
FailedPreconditionError (see above for traceback): Error while reading resource variable dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense/kernel)
[[Node: dense/kernel/Read/ReadVariableOp = ReadVariableOpdtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
In attempts to return the layer weights using a new session and the appropriate graph
sess = tf.Session(graph=mModel.output.graph)
sess.run(tf.global_variables_initializer())
mModel.layers[1].weights[0].eval(session=sess)
I get the error:
ValueError: Fetch argument cannot be interpreted as a Tensor. (Operation name: "init"
op: "NoOp" is not an element of this graph.)
The error "tensor is not an element of this graph" can be resolved by using the graph associated with a tensor in the model.
with mModel.output.graph.as_default():
for modelLayer in mModel.layers:
if not modelLayer.output == mModel.input:
intermediateModel = keras.models.Model(inputs=mModel.input, outputs=modelLayer.output)
layerActivations = intermediateModel.predict(np.array([inputs]))[0]
Edit: Updated Solution
While the original solution below fixes the problem reported, the values provided for the weights using ...eval(sess) are initialized values and not learned values or values that predict uses. There may be a way to use eval to get the proper result, but I am not aware of it. The alternative solution I found is to use get_weights() on the model or layer, as in:
mModel.get_weights()
mModel.layers[1].get_weights()
Original solution
The problem with resolving the weights is a combination of using the proper graph and initializing the session with the weights' initializer rather than the global initializer.
sess = tf.Session(graph=mModel.output.graph)
weights = modelLayer.weights[0]
sess.run(weights.initializer)
weightsValues = weights.eval(session=sess)
These solutions work across threads.

Tensorflow: Using weights trained in one model inside another, different model

I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above

Categories

Resources