def biLSTM(data, n_steps):
n_hidden= 24
data = tf.transpose(data, [1, 0, 2])
# Reshape to (n_steps*batch_size, n_input)
data = tf.reshape(data, [-1, 300])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
data = tf.split(0, n_steps, data)
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, _, _ = tf.nn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, data, dtype=tf.float32)
return outputs, n_hidden
In my code I am calling this function twice to create 2 bidirectional LSTMs. Then I got the problem of reusing variables.
ValueError: Variable lstm/BiRNN_FW/BasicLSTMCell/Linear/Matrix
already exists, disallowed. Did you mean to set reuse=True in
VarScope?
To resolve this I added the LSTM definition in the function within with tf.variable_scope('lstm', reuse=True) as scope:
This led to a new issue
ValueError: Variable lstm/BiRNN_FW/BasicLSTMCell/Linear/Matrix does
not exist, disallowed. Did you mean to set reuse=None in VarScope?
Please help with a solution to this.
When you create BasicLSTMCell(), it creates all the required weights and biases to implement an LSTM cell under the hood. All of these variables are assigned names automatically. If you call the function more than once within the same scope you get the error you get. Since your question seems to state that you want to create two separate LSTM cells, you do not want to reuse the variables, but you do want to create them in separate scopes. You can do this in two different ways (I haven't actually tried to run this code, but it should work). You can call your function from within a unique scope
def biLSTM(data, n_steps):
... blah ...
with tf.variable_scope('LSTM1'):
outputs, hidden = biLSTM(data, steps)
with tf.variable_scope('LSTM2'):
outputs, hidden = biLSTM(data, steps)
or you can pass a unique scope name to the function and use the scope inside
def biLSTM(data, n_steps, layer_name):
... blah...
with tf.variable_scope(layer_name) as scope:
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, _, _ = tf.nn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, data, dtype=tf.float32)
return outputs, n_hidden
l1 = biLSTM(data, steps, 'layer1')
l2 = biLSTM(data, steps, 'layer2')
It is up to your coding sensibilities which approach to choose, they are functionally pretty much the same.
I also has the similar problem. However I was using keras implementation with pretrained Resnet50 model.
It worked for me when I updated the tensorflow version using following command:
conda update -f -c conda-forge tensorflow
and used
from keras import backend as K
K.clear_session
Related
I am implementing an encoder decoder model using bidirectional RNN for both encoder and decoder. Since I initialize the bidirectional RNN on the encoder side and the weights and vectors associated with the bidirectional RNN is already initialized, I get the following error when I try to initialize another instance on the decoder side:
ValueError: Variable bidirectional_rnn/fw/gru_cell/w_ru already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
I tried defining each within it's own name_scope like below but to no avail:
def enc(message, weights, biases):
message = tf.unstack(message, timesteps_enc, 1)
fw_cell = rnn.GRUBlockCell(num_hidden_enc)
bw_cell = rnn.GRUBlockCell(num_hidden_enc)
with tf.name_scope("encoder"):
outputs, _, _ = rnn.static_bidirectional_rnn(fw_cell, bw_cell, message, dtype=tf.float32)
return tf.matmul(outputs[-1], weights) + biases
def dec(codeword, weights, biases):
codeword = tf.expand_dims(codeword, axis=2)
codeword = tf.unstack(codeword, timesteps_dec, 1)
fw_cell = rnn.GRUBlockCell(num_hidden_dec)
bw_cell = rnn.GRUBlockCell(num_hidden_dec)
with tf.name_scope("decoder"):
outputs, _, _ = rnn.static_bidirectional_rnn(fw_cell, bw_cell, codeword, dtype=tf.float32)
return tf.matmul(outputs[-1], weights) + biases
Can someone please hint at what I am doing wrong?
Just putting it as an answer:
Just try to exchange name_scope for variable_scope. I'm not sure if it is still valid, but for older versions of TF, usage of name_scope was not encouraged. From your variable name bidirectional_rnn/fw/gru_cell/w_ru you can see that the scope is not applied.
One thing is that you cannot create variables with the same name in the same scope, so changing name_scope for variable_scope will fix the training.
The other thing is that such a model cannot work as an encoder-decoder model because the decoder RNN cannot be bidirectional. You indeed have the entire target sequences at the training time, but at the inference time, you generate the target left-to-right. This means you only have the left context for the forward RNN, but you don't have the right context for the backward RNN.
I am trying to create a very simple neural network reading in information with the shape 1x2048 and to create a classification for two categories (object or not object). The graph structure however, deviates from what I believe to have coded. The dense layers should be included in the scope of "inner_layer" and should be receiving their input from the "input" placeholder. Instead, TF seems to be treating them as independent layers which do not receive any information from "input".
Also, when using trying to use tensorboard summaries I get an error telling me that I have not mentioned inserting inputs for the apparent placeholders of the dense layers. When omitting tensorboard, everything works as I expected it based on the code.
I have spent a lot of time trying to find the problem but I think I must be overlooking an something very basic.
The graph I get in tensorboard is on this image.
Which corresponds to the following code:
tf.reset_default_graph()
keep_prob = 0.5
# Graph Strcuture
## Placeholders for input
with tf.name_scope('input'):
x_ = tf.placeholder(tf.float32, shape = [None, transfer_values_train.shape[1]], name = "input1")
y_ = tf.placeholder(tf.float32, shape = [None, num_classes], name = "labels")
## Dense Layer one with 2048 nodes
with tf.name_scope('inner_layers'):
first_layer = tf.layers.dense(x_, units = 2048, activation=tf.nn.relu, name = "first_dense")
dropout_layer = tf.nn.dropout(first_layer, keep_prob, name = "dropout_layer")
#readout layer, without softmax
y_conv = tf.layers.dense(dropout_layer, units = 2, activation=tf.nn.relu, name = "second_dense")
# Evaluation and training
with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_ , logits = y_conv),
name = "cross_entropy_layer")
with tf.name_scope('trainer'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('accuracy'):
prediction = tf.argmax(y_conv, axis = 1)
correct_prediction = tf.equal(prediction, tf.argmax(y_, axis = 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Does anyone have an idea why the graph is so different from what you would expect based on the code?
The graph rendering in tensorboard may be a bit confusing (initially), but it's correct. Take a look at this picture where I've left only the inner_layers part of your graph:
You may notice that:
The first_dense and second_dense are actually the name scopes themselves (generated by tf.layers.dense function; see also this question).
Their input/output tensors are inside the inner_layers scope and wire correctly to the dropout_layer. Here, in each of dense layers, live the corresponding linear ops: MatMul, BiasAdd, Relu.
Both scopes also include the variables (kernel and bias each), that are shown separately from inner_layers. They encapsulate the ops related specifically to variable, such as read, assign, initialize, etc. The linear ops in first_dense depend on the variable ops of first_dense, and second_dense likewise.
The reason for this separation is that in distributed settings the variables are manages by a different task called parameter server. It's usually run on a different device (CPU as opposed to GPU), sometimes even on a different machine. In other words, for tensorflow the variable management is by design different from matrix computation.
Having said that, I'd love to see a mode in tensorflow that would not split the scope into variables and ops and keep them coupled.
Other than this the graph perfectly matches the code.
I have these two unnamed op tensors logits and outputs under a variable scope, but the lt command isn't listing these two tensors under the op 'MatMul' and 'Softmax' during the tfdbg session after a test run on a checkpoint. Here is a snapshot of the code:
with tf.variable_scope(scope):
d_inputs = dropout(inputs, keep_prob=keep_prob, is_train=is_train)
d_memory = dropout(memory, keep_prob=keep_prob, is_train=is_train)
JX = tf.shape(inputs)[1]
with tf.variable_scope("attention"):
inputs_ = tf.nn.relu(dense(d_inputs, hidden, use_bias=False, scope="inputs"))
memory_ = tf.nn.relu(dense(d_memory, hidden, use_bias=False, scope="memory"))
outputs = tf.matmul(inputs_, tf.transpose(memory_, [0, 2, 1])) / (hidden ** 0.5)
mask = tf.tile(tf.expand_dims(mask, axis=1), [1, JX, 1])
# The tensor down below
logits = tf.nn.softmax(softmax_mask(outputs, mask))
# And the tensor down below here as well
outputs = tf.matmul(logits, memory)
res = tf.concat([inputs, outputs], axis=2)
What can I do to retrieve these variables for testing purposes on tfdbg?
For alternative purposes, can I retrieve them using the normal tensorflow session by using tf.add_to_collection(op_name, tensor) as mentioned in this answer?
There is really no such thing as "unnamed tensor". If you don't provide a name for an op, it will use a default name. The output tensors will usually be named using the operation's OpDef spec. If there is a single output, it can be named using just the op name. If the name is already taken, it will be made unique by appending _N to it.
You should see all tensors with plain lt command. Are you giving some options to it?
For the second question, tensorflow collections are basically just dictionaries that can be saved and restored "automatically". If you want to get have a persistent fixed name for some tensor, you can certainly save it in a collection and retrieve it after restoring from a checkpoint.
I want to save the final state of my LSTM such that it's included when I restore the model and can be used for prediction. As explained below, the Saver only has knowledge of the final state when I use tf.assign. However, this throws an error (also explained below).
During training I always feed the final LSTM state back into the network, as explained in this post. Here are the important parts of the code:
When building the graph:
self.init_state = tf.placeholder(tf.float32, [
self.n_layers, 2, self.batch_size, self.n_hidden
])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training:
_current_state = np.zeros((self.n_layers, 2, self.batch_size,
self.n_hidden))
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels,
self.init_state: _current_state})
When I later restore my model from a checkpoint, the final state is not restored as well. As outlined in this post the problem is that the Saver has no knowledge of the new state. The post also suggests a solution, based on tf.assign. Regrettably, I cannot use the suggested
assign_op = tf.assign(self.init_state, _current_state)
self.sess.run(assign_op)
because self.init state is not a Variable but a placeholder. I get the error
AttributeError: 'Tensor' object has no attribute 'assign'
I have tried to solve this problem for several hours now but I can't get it to work.
Any help is appreciated!
EDIT:
I have changed self.init_state to
self.init_state = tf.get_variable('saved_state', shape=
[self.n_layers, 2, self.batch_size, self.n_hidden])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training I don't feed a value for self.init_state:
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels})
However, I still can't run the assignment op. Know I get
TypeError: Expected float32 passed to parameter 'value' of op 'Assign', got (LSTMStateTuple(c=array([[ 0.07291573, -0.06366599, -0.23425588, ..., 0.05307654,
In order to save the final state, you can create a separate TF variable, then before saving the graph, run an assign op to assign your latest state to that variable, and then save the graph. The only thing you need to keep in mind is to declare that variable BEFORE you declare the Saver; otherwise it won't be included in the graph.
This is discussed at great detail here, including the working code:
TF LSTM: Save State from training session for prediction session later
*** UPDATE: answers to followup questions:
It looks like you are using BasicLSTMCell, with state_is_tuple=True. The prior discussion that I referred you to used GRUCell with state_is_tuple=False. The details between the two are somewhat different, but the overall approach could be similar, so hopefully this should work for you:
During training, you first feed zeros as initial_state into dynamic_rnn and then keep re-feeding its own output back as input as initial_state. So, the LAST output state of our dynamic_rnn call is what you want to save for later. Since it results from a sess.run() call, essentially it's a numpy array (not a tensor and not a placeholder). So the question amounts to "how do I save a numpy array as a Tensorflow variable along with the rest of the variables in the graph." That's why you assign the final state to a variable whose only purpose is that.
So, code is something like this:
# GRAPH DEFINITIONS:
state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in')
l = tf.unstack(state_in, axis=0)
state_tup = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(NLAYERS)])
#multicell = your BasicLSTMCell / MultiRNN definitions
output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup)
savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE])
saver = tf.train.Saver(max_to_keep=1)
in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE))
# TRAINING LOOP:
feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state}
_, out_state = sess.run([training_step, state_out], feed_dict=feed_dict)
in_state = out_state
# ONCE TRAINING IS OVER:
assignOp = tf.assign(savedState, out_state)
sess.run(assignOp)
saver.save(sess, pathModel + '/my_model.ckpt')
# RECOVERING IN A DIFFERENT PROGRAM:
gInit = tf.global_variables_initializer().run()
lInit = tf.local_variables_initializer().run()
new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
new_saver.restore(sess, pathModel + 'my_model.ckpt')
# retrieve State and get its LAST batch (latest obervarions)
savedState = sess.run('savedState:0') # this is FULL state from training
state = savedState[:,:,-1,:] # -1 gets only the LAST batch of the state (latest seen observations)
state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE]
#x = .... (YOUR INPUTS)
feed_dict = {'X:0': x, 'state_in:0':state}
#PREDICTION LOOP:
preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict)
# so now state will be re-fed into feed_dict with the next loop iteration
As mentioned, this is a modified approach of what works well for me with GRUCell, where state_is_tuple = False. I adapted it to try BasicLSTMCell with state_is_tuple=True. It works, but not as accurately as the original approach. I don't know yet whether its just because for me GRU is better than LSTM or for some other reason. See if this works for you...
Also keep in mind that, as you can see with the recovery and prediction code, your predictions will likely be based on a different batch size than your training loop (I guess batch of 1?) So you have to think through how to handle your recovered state -- just take the last batch? Or something else? This code takes the last layer of the saved state only (i.e. the most recent observations from training) because that's what was relevant for me...
I'm trying to get into tensorflow, setting up a network and then feeding data to it. For some reason I end up with the error message ValueError: setting an array element with a sequence. I made a minimal example of what I'm trying to do:
import tensorflow as tf
K = 10
lchild = tf.placeholder(tf.float32, shape=(K))
rchild = tf.placeholder(tf.float32, shape=(K))
parent = tf.nn.tanh(tf.add(lchild, rchild))
input = [ tf.Variable(tf.random_normal([K])),
tf.Variable(tf.random_normal([K])) ]
with tf.Session() as sess :
print(sess.run([parent], feed_dict={ lchild: input[0], rchild: input[1] }))
Basically, I'm setting up a network with place holders and a sequence of input embeddings that I want to learn, and then I try to run the network, feeding the input embeddings into it. From what I can tell by searching for the error message, there might be something wrong with my feed_dict, but I can't see any obvious mismatches in eg. dimensionality.
So, what did I miss, or how did I get this completely backwards?
EDIT: I've edited the above to clarify that the input represents embeddings that need to be learned. I guess the question can be asked more sharply as: Is it possible to use placeholders for parameters?
The inputs should be numpy arrays.
So, instead of tf.Variable(tf.random_normal([K])), simply write np.random.randn(K) and everything should work as expected.
EDIT (The question was clarified after my answer):
It is possible to use placeholders as parameters but in a slightly different way. For example:
lchild = tf.placeholder(tf.float32, shape=(K))
rchild = tf.placeholder(tf.float32, shape=(K))
parent = tf.nn.tanh(tf.add(lchild, rchild))
loss = <some loss that depends on the parent tensor or lchild/rchild>
# Compute gradients with respect to the input variables
grads = tf.gradients(loss, [lchild, rchild])
inputs = [np.random.randn(K), np.random.randn(K)]
for i in range(<number of iterations>):
np_grads = sess.run(grads, feed_dict={lchild:inputs[0], rchild:inputs[1])
inputs[0] -= 0.1 * np_grads[0]
inputs[1] -= 0.1 * np_grads[1]
It is not however the best or easiest way to do this. The main problem with it is that at every iteration you need to copy numpy arrays in and out of the session (which is running potentially on a different device like GPU).
Placeholders generally are used to feed the data external to the model (like texts or images). The way to solve it using tensorflow utilities would be something like:
lchild = tf.Variable(tf.random_normal([K])
rchild = tf.Variable(tf.random_normal([K])
parent = tf.nn.tanh(tf.add(lchild, rchild))
loss = <some loss that depends on the parent tensor or lchild/rchild>
train_op = tf.train.GradientDescentOptimizer(loss).minimize(0.1)
for i in range(<number of iterations>):
sess.run(train_op)
# Retrieve the weights back to numpy:
np_lchild = sess.run(lchild)