I'm trying to build a pixel-wise classification LSTM RNN using tensorflow. My model is displayed in the picture below. The problem I'm having is building a 3D LSTM RNN. The code that I have builds a 2D LSTM RNN, so I placed the code inside a loop, but now I get the following error:
ValueError: Variable RNN/BasicLSTMCell/Linear/Matrix does not exist, disallowed. Did you mean to set reuse=None in VarScope?
So here's the network:
The idea goes like this... an input image of size (200,200) is the input into a LSTM RNN of size (200,200,200). Each sequence output from the LSTM tensor vector (the pink boxes in the LSTM RNN) is fed into a MLP, and then the MLP makes a single output prediction -- ergo pixel-wise prediction (you can see how one input pixel generates one output "pixel"
So here's my code:
...
n_input_x = 200
n_input_y = 200
x = tf.placeholder("float", [None, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_input_x, n_input_y])
def RNN(x):
x = tf.transpose(x, [1, 0, 2])
x = tf.reshape(x, [-1, n_input_x])
x = tf.split(0, n_steps, x)
output_matrix = []
for i in xrange(200):
temp_vector = []
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
for j in xrange(200):
lstm_vector = outputs[j]
pixel_pred = multilayer_perceptron(lstm_vector, mlp_weights, mlp_biases)
temp_vector.append(pixel_pred)
output_matrix.append(temp_vector)
print i
return output_matrix
temp = RNN(x)
pred = tf.placeholder(temp, [None, n_input_x, n_input_y])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
...
You can see that I placed the call to RNN inside the first loop. In this way, I generate a new RNN every time. I know Tensorflow auto-increments other Tensors.
debugging I have
(Pdb) lstm_cell
<tensorflow.python.ops.rnn_cell.BasicLSTMCell object at 0x7f9d26956850>
and then for outputs I have a vector of 200 BasicLSTMCells
(Pdb) len(outputs)
200
...
<tf.Tensor 'RNN_2/BasicLSTMCell_199/mul_2:0' shape=(?, 200) dtype=float32>]
So ideally, I want the second call to RNN to generate LSTM tensors with indexes 200-399
I tried this, but it won't construct a RNN because the dimensions of 40000 and x (after the split) don't line up.
x = tf.reshape(x, [-1, n_input_x])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_input_y, x)
lstm_cell = rnn_cell.BasicLSTMCell(40000, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
output_matrix = []
for i in xrange(200):
temp_vector = []
for j in xrange(200):
lstm_vector = outputs[i*j]
So then I also tried to get rid of the split, but then it complains that it must be a list. So then I tried reshaping x = tf.reshape(x, [n_input_x * n_input_y]) but then it still says it must be a list
Related
Here is what I have tried:
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None,n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,
activation=tf.nn.leaky_relu, use_peepholes = True)
for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
tf.summary.histogram("outputs", rnn_outputs)
tf.summary.image("RNN",rnn_outputs)
I am getting the following error:
InvalidArgumentError: Tensor must be 4-D with last dim 1, 3, or 4, not [55413,4,100]
[[Node: RNN_1 = ImageSummary[T=DT_FLOAT, bad_color=Tensor<type: uint8 shape: [4] values: 255 0 0...>, max_images=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](RNN_1/tag, rnn/transpose_1)]]
Kindly, help me get the visualization of the rnn inside the LSTM model that I am trying to run. This will help in understanding what LSTM is doing more accurately.
You can plot each RNN output as an image with one axis being the time and the other axis being the output. Here is an small example:
import tensorflow as tf
import numpy as np
n_steps = 100
n_inputs = 10
n_neurons = 10
n_layers = 3
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,
activation=tf.nn.leaky_relu, use_peepholes=True)
for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, x, dtype=tf.float32)
# Time steps in horizontal axis, outputs in vertical axis, add last dimension for channel
rnn_out_imgs = tf.transpose(rnn_outputs, (0, 2, 1))[..., tf.newaxis]
out_img_sum = tf.summary.image("RNN", rnn_out_imgs, max_outputs=10)
init_op = tf.global_variables_initializer()
with tf.Session() as sess, tf.summary.FileWriter('log') as fw:
sess.run(init_op)
fw.add_summary(sess.run(out_img_sum, feed_dict={x: np.random.rand(10, n_steps, n_inputs)}))
You would get a visualization that could look like this:
Here the brighter pixels would represent a stronger activation, so even if it is hard to tell what exactly is causing what you can at least see if any meaningful patterns arise.
Your RNN output has the wrong shape for tf.summary.image. The tensor should be four-dimensional with the dimensions' sizes given by [batch_size, height, width, channels].
In your code, you're calling tf.summary.image with rnn_outputs, which has shape [55413, 4, 100]. Assuming your images are 55413-by-100 pixels in size and that each pixel contains 4 channels (RGBA), I'd use tf.reshape to reshape rnn_outputs to [1, 55413, 100, 4]. Then you should be able to call tf.summary.image without error.
I don't think I can help you visualize the RNN's operation, but when I was learning about RNNs and LSTMs, I found this article very helpful.
layer_1 = tf.layers.dense(inputs=layer_c, units=512, activation=tf.nn.tanh, name='layer1')
layer_2 = tf.layers.dense(inputs=1, units=512, activation=tf.nn.tanh, name='layer2')
Here my layer_2 output is [batch_size,512]. I need to send this layer_2 output through a single lstm unit. But when I tried tf.nn.static_rnn it gives an error saying my input should be a sequence. How can I perform this task?
From the documentation for static_rnn, the inputs argument is expecting a list:
inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.
In your case, T==1, so you can just pass it a single-element list containing your previous layer. To keep track of the internal cell and hidden states in such a way that you can keep them across timesteps, you can add additional placeholders and pass them to the static_rnn using the initial_state attribute. Because cell.state_size is a tuple for LSTM cells (of (cell_state, hidden_state)), we have to pass a tuple for this attribute, and a tuple will be returned for the output state.
Here is a minimal working example based on your code, just feeding placeholders of ones for the input at each timestep and tracking the internal states across time:
import tensorflow as tf
import numpy as np
num_timesteps = 6
batch_size = 3
num_input_feats = 100
num_lstm_units = 5
lstm_cell = tf.nn.rnn_cell.LSTMCell(num_lstm_units)
input_x = tf.placeholder(tf.float32, [None, num_input_feats], name='input')
input_c_state = tf.placeholder(tf.float32, [None, lstm_cell.state_size.c], name='c_state')
input_h_state = tf.placeholder(tf.float32, [None, lstm_cell.state_size.h], name='h_state')
layer_1 = tf.layers.dense(inputs=input_x, units=512, activation=tf.nn.tanh, name='layer1')
layer_2 = tf.layers.dense(inputs=layer_1, units=512, activation=tf.nn.tanh, name='layer2')
layer_2_next, next_state = tf.nn.static_rnn(lstm_cell, [layer_2], dtype=tf.float32,
initial_state=(input_c_state, input_h_state))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# initialize the internal cell state and hidden state to zero
cur_c_state = np.zeros([batch_size, lstm_cell.state_size.c], dtype="float32")
cur_h_state = np.zeros([batch_size, lstm_cell.state_size.h], dtype="float32")
for i in range(num_timesteps):
# here is your single timestep of input
cur_x = np.ones([batch_size, num_input_feats], dtype="float32")
y_out, out_state = sess.run([layer_2_next, next_state],
feed_dict={input_x: cur_x,
input_c_state: cur_c_state,
input_h_state: cur_h_state})
cur_c_state, cur_h_state = out_state # pass states along to the next timestep
print (y_out) # here is your single timestep of output
I'm making a LSTM neural network in Tensorflow.
The input tensor size is 92.
import tensorflow as tf
from tensorflow.contrib import rnn
import data
test_x, train_x, test_y, train_y = data.get()
# Parameters
learning_rate = 0.001
epochs = 100
batch_size = 64
display_step = 10
# Network Parameters
n_input = 28 # input size
n_hidden = 128 # number of hidden layers
n_classes = 20 # output size
# Placeholders
x = tf.placeholder(dtype=tf.float32, shape=[None, n_input])
y = tf.placeholder(dtype=tf.float32, shape=[None, n_classes])
# Network
def LSTM(x):
W = tf.Variable(tf.random_normal([n_hidden, n_classes]), dtype=tf.float32) # weights
b = tf.Variable(tf.random_normal([n_classes]), dtype=tf.float32) # biases
x_shape = 92
x = tf.transpose(x)
x = tf.reshape(x, [-1, n_input])
x = tf.split(x, x_shape)
lstm = rnn.BasicLSTMCell(
num_units=n_hidden,
forget_bias=1.0
)
outputs, states = rnn.static_rnn(
cell=lstm,
inputs=x,
dtype=tf.float32
)
output = tf.matmul( outputs[-1], W ) + b
return output
# Train Network
def train(x):
prediction = LSTM(x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(prediction, feed_dict={"x": train_x})
print(output)
train(x)
I'm not getting any errors, but I'm feeding an input tensor of size 92, and the matrix multiplication in the LSTM function returns a list containing one result vector, when the desired amount is 92, one result vector per input.
Is the problem that I'm matrix multiplying only the last item in the outputs array? Like this:
output = tf.matmul( outputs[-1], W ) + b
instead of:
output = tf.matmul( outputs, W ) + b
This is the error I get when I do the latter:
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [92,?,128], [128,20].
static_rnn for making the simplest recurrent neural net.Here's the tf documentation.So the input to it should be a sequence of tensors. Let's say you want to input 4 words calling "Hi","how","Are","you". So your input place holder should consist of four n(size of each input vector) dimensional vectors corresponding to each words.
I think there's something wrong with your place holder. You should initialize it with number of inputs to the RNN. 28 is number of dimensions in each vector. I believe 92 is the length of the sequence. (more like 92 lstm cell)
In the output list you will get set of vectors equal to length of sequence each of size equal to number of hidden units.
I'm building a LSTM RNN with Tensorflow that performs pixel-wise classification (or, maybe a better way to put it is, pixel-wise prediction?)
Bear with me as I explain the title.
The network looks like the following drawing...
The idea goes like this... an input image of size (200,200) is the input into a LSTM RNN of size (200,200,200). Each sequence output from the LSTM tensor vector (the pink boxes in the LSTM RNN) is fed into a MLP, and then the MLP makes a single output prediction -- ergo pixel-wise prediction (you can see how one input pixel generates one output "pixel"
The code looks like this (not all of the code, just parts that are needed):
...
n_input_x = 200
n_input_y = 200
x = tf.placeholder("float", [None, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_input_x, n_input_y])
def RNN(x):
x = tf.transpose(x, [1, 0, 2])
x = tf.reshape(x, [-1, n_input_x])
x = tf.split(0, n_steps, x)
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
output_matrix = []
for i in xrange(200):
temp_vector = []
for j in xrange(200):
lstm_vector = outputs[j]
pixel_pred = multilayer_perceptron(lstm_vector, mlp_weights, mlp_biases)
temp_vector.append(pixel_pred)
output_matrix.append(temp_vector)
print i
return output_matrix
temp = RNN(x)
pred = tf.placeholder(temp, [None, n_input_x, n_input_y])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
...
I have confirmed that the output of RNN -- that is, what is stored in temp is a 200x200 array of <tf.Tensor 'Softmax_39999:0' shape=(?, 1) dtype=float32>
As you can see, I place temp in a tf.placeholder of the same shape (None for the batch size ... or do I need this?)... and the program just exits as if it completed running. Ideally what I want to see when I debug and print pred is something like <tf.Tensor shape=(200,200)>
When I debug, the first time I execute pred = tf.placeholder(temp, [None, n_input_x, n_input_y]) I get TypeError: TypeErro...32>]].",) and then it returns and I try again, and it says Exception AttributeError: "'NoneType' object has no attribute 'path'" in <function _remove at 0x7f1ab77c26e0> ignored
EDIT I also now realize that I need to place the lines
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
Inside the first loop so that new 2D LSTM RNN are generated, however I'm getting an error about variable reusing ValueError: Variable RNN/BasicLSTMCell/Linear/Matrix does not exist, disallowed. Did you mean to set reuse=None in VarScope?
So in other words, is it isn't auto incrementing the RNN tensor name?
A more convenient way to report shapes is with tf.shape(). In your case:
size1 = tf.shape(temp)
sess = tf.Session()
size1_fetched = sess.run(size1, feed_dict = your_feed_dict)
That way, the size1_fetched is something like you would get from NumPy. Moreover, also your sizes for that particular feed_dict are given. For example, your [None, 200, 200] Tensor would be [64, 200, 200]
Another question: why do you have the placeholder in between your flow-graph? Will you later on feed pre-defined images feature-maps?
I find two kinds of implementations of RNN in tensorflow.
The first implementations is this (from line 124 to 129). It uses a loop to define each step of input in RNN.
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
states.append(state)
The second implementation is this (from line 51 to 70). It doesn't use any loop to define each step of input in RNN.
def RNN(_X, _istate, _weights, _biases):
# input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate)
# Linear activation
# Get inner loop last output
return tf.matmul(outputs[-1], _weights['out']) + _biases['out']
In the first implementation, I find there is no weight matrix between input unit to hidden unit, only define weight matrix between hidden unit to out put unit (from line 132 to 133)..
output = tf.reshape(tf.concat(1, outputs), [-1, size])
softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
softmax_b = tf.get_variable("softmax_b", [vocab_size])
logits = tf.matmul(output, softmax_w) + softmax_b
But in the second implementation, both of the weight matrices are defined (from line 42 to 47).
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
I wonder why?
The difference I noticed is that the code in the second implementation uses tf.nn.rnn which takes list of inputs for each time step and generated the list of outputs for each time step.
(Inputs: A length T list of inputs, each a tensor of shape
[batch_size, input_size].)
So, if you check the code in the second implementation on line 62 the input data is shaped into n_steps * (batch_size, n_hidden)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
In the 1st implementation they are looping through the n_time_steps and providing the input and get the corresponding output and storing in the outputs list.
Code snippet from line 113 to 117
outputs = []
state = self._initial_state
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
Coming to your second question:
If you carefully notice the way the inputs are being fed to the RNN in both the implementations.
In the first implementation the inputs are already of shape batch_size x num_steps (here num_steps is hidden size):
self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
Whereas in the second implementation the initial inputs are of shape (batch_size x n_steps x n_input). So a weight matrix is required to transform to the shape (n_steps x batch_size x hidden_size):
# Input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # Permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
I hope this is helpful...