Tensorflow LSTM - Matrix multiplication on LSTM cell

Tensorflow LSTM - Matrix multiplication on LSTM cell - python

I'm making a LSTM neural network in Tensorflow.
The input tensor size is 92.
import tensorflow as tf
from tensorflow.contrib import rnn
import data
test_x, train_x, test_y, train_y = data.get()
# Parameters
learning_rate = 0.001
epochs = 100
batch_size = 64
display_step = 10
# Network Parameters
n_input = 28 # input size
n_hidden = 128 # number of hidden layers
n_classes = 20 # output size
# Placeholders
x = tf.placeholder(dtype=tf.float32, shape=[None, n_input])
y = tf.placeholder(dtype=tf.float32, shape=[None, n_classes])
# Network
def LSTM(x):
W = tf.Variable(tf.random_normal([n_hidden, n_classes]), dtype=tf.float32) # weights
b = tf.Variable(tf.random_normal([n_classes]), dtype=tf.float32) # biases
x_shape = 92
x = tf.transpose(x)
x = tf.reshape(x, [-1, n_input])
x = tf.split(x, x_shape)
lstm = rnn.BasicLSTMCell(
num_units=n_hidden,
forget_bias=1.0
)
outputs, states = rnn.static_rnn(
cell=lstm,
inputs=x,
dtype=tf.float32
)
output = tf.matmul( outputs[-1], W ) + b
return output
# Train Network
def train(x):
prediction = LSTM(x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(prediction, feed_dict={"x": train_x})
print(output)
train(x)
I'm not getting any errors, but I'm feeding an input tensor of size 92, and the matrix multiplication in the LSTM function returns a list containing one result vector, when the desired amount is 92, one result vector per input.
Is the problem that I'm matrix multiplying only the last item in the outputs array? Like this:
output = tf.matmul( outputs[-1], W ) + b
instead of:
output = tf.matmul( outputs, W ) + b
This is the error I get when I do the latter:
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [92,?,128], [128,20].

static_rnn for making the simplest recurrent neural net.Here's the tf documentation.So the input to it should be a sequence of tensors. Let's say you want to input 4 words calling "Hi","how","Are","you". So your input place holder should consist of four n(size of each input vector) dimensional vectors corresponding to each words.
I think there's something wrong with your place holder. You should initialize it with number of inputs to the RNN. 28 is number of dimensions in each vector. I believe 92 is the length of the sequence. (more like 92 lstm cell)
In the output list you will get set of vectors equal to length of sequence each of size equal to number of hidden units.

Related

keras input shape: Input incompatible with the layer

I've looked at a few similar questions but I still don't understand how to solve my problem.
I am trying to build a CNN that estimates how many particles hit a detector, based on what's essentially an oscilloscope trace of the energy released in the detector over time.
I have 100,000 events of 1024 time samples, which I split 80/20 as train/test, like so:
from sklearn.model_selection import train_test_split
train_to_test_ratio=0.8 #proportion of the dataset to include in the train split
X_train,X_test,Y_train,Y_test=train_test_split(NormSignals,labels,train_size=train_to_test_ratio)
no_outputs = 14 # maximum number of particles expected
# force the labels to have 14 binary digits, one for each of the possible outputs
Y_train=tf.one_hot(Y_train,no_outputs)
Y_test=tf.one_hot(Y_test,no_outputs)
When I try to define the input shape for the network I do so like this (full CNN code below):
# Define input to neural network (tensors of 1024 time samples x 1 amplitude per sample)
inputs = keras.Input(shape=(1024,1))
But it gives me the error: "Input 0 of layer Conv_1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 1024, 1]"
I thought the input shape was as simple as the shape of the data arrays being passed to the network. Can someone please explain what the correct shape of my data should be?
Thank you very much in advance!
Full CNN:
from tensorflow import keras
# Following the architecture of the CNN from the image recognition lab (14/5/2020):
# Simple CNN:
class noiseLayer(keras.layers.Layer):
def __init__(self,mean):
super(noiseLayer, self).__init__()
self.mean = mean
def call(self, input):
mean = self.mean
return input + (np.random.poisson(mean))/mean
# Add data augmentation to produce a random flip of the data (the ECal is symmetrical)
# and add poissonian noise to all of the crystals - using large N and dividing by N normalises
# the noise to be approximately continuous between 0 and 1
data_augmentation = keras.Sequential([
noiseLayer(mean = 1000)
], name='DataAugm')
# Define input to neural network (tensors of 1024 time samples x 1 amplitude per sample)
inputs = keras.Input(shape=(1024,1))
#x=inputs
x = data_augmentation(inputs)
# primo blocco Convoluzionale
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_1')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_1')(x)
# secondo blocco Convoluzionale
x = keras.layers.Conv2D(16, kernel_size=(3,3), name='Conv_2')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_2')(x)
# terzo blocco convoluzionale
x = keras.layers.Conv2D(32, kernel_size=(3,3), name='Conv_3')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool2D((2,2), name='MaxPool_3')(x)
# Flatten output tensor of the last convolutional layer so it can be used as
# input to the dense layers
x = keras.layers.Flatten(name='Flatten')(x)
# dense network: 2 dense hidden layer with 256 neurons, with ReLU activation
# Classifier
x = keras.layers.Dense(64, name='Dense_1')(x)
x = keras.layers.ReLU(name='ReLU_dense_1')(x)
#x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(64, name='Dense_2')(x)
x = keras.layers.ReLU(name='ReLU_dense_2')(x)
outputs = keras.layers.Dense(no_outputs, activation='softmax', name='Output')(x)
# Model definition
model = keras.Model(inputs=inputs, outputs=outputs, name='VGGlike_CNN')
# Print model summary
model.summary()
# Show model structure
keras.utils.plot_model(model, show_shapes=True)

The problem was that I was using 2D layers to try to solve a 1D problem.
Changing all the 2D layers to 1D now compiles without errors:
x = keras.layers.Conv1D(16, kernel_size=(3), name='Conv_1')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_1')(x)
# secondo blocco Convoluzionale
x = keras.layers.Conv1D(16, kernel_size=(3), name='Conv_2')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_2')(x)
# terzo blocco convoluzionale
x = keras.layers.Conv1D(32, kernel_size=(3), name='Conv_3')(x)
x = keras.layers.LeakyReLU(0.1)(x)
x = keras.layers.MaxPool1D((2), name='MaxPool_3')(x)
# Flatten output tensor of the last convolutional layer so it can be used as
# input to the dense layers
x = keras.layers.Flatten(name='Flatten')(x)
# dense network: 2 dense hidden layer with 256 neurons, with ReLU activation
# Classifier
x = keras.layers.Dense(64, name='Dense_1')(x)
x = keras.layers.ReLU(name='ReLU_dense_1')(x)
#x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(64, name='Dense_2')(x)
x = keras.layers.ReLU(name='ReLU_dense_2')(x)

Tensorflow - replace MNIST on other dataset

I have a problem with using diffrent dataset then default from tensorflow.
I have code using MNIST dataset to recognize digits. In this application there is generated graph, which is imported later by android app.
Now I would like to recognize digits and math's operators (basic one: +, -, *, /).
I found script to generate data I need. I have two .pickle files.
But even with the dataset which suits for me, still I don't know how to import this dataset to my app with tensorflow.
I would be grateful for help with this or maybe to give me other (maybe easier) solution.
EDIT
I did some changes in the code which were adviced by gabriele.
Now I have error:
(x, label) = train_pickle_reader('train.pickle')
ValueError: too many values to unpack (expected 2)
I found the description of the dataset I used:
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Below there is code in python:
import tensorflow as tf
import pickle
def train_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
def test_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
# Function to create a weight neuron using a random number. Training will assign a real weight later
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial, name=name)
# Function to create a bias neuron. Bias of 0.1 will help to prevent any 1 neuron from being chosen too often
def biases_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
# Function to create a convolutional neuron. Convolutes input from 4d to 2d. This helps streamline inputs
def conv_2d(x, W, name):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name=name)
# Function to create a neuron to represent the max input. Helps to make the best prediction for what comes next
def max_pool(x, name):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
# A way to input images (as 784 element arrays of pixel values 0 - 1)
x_input = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='x_input')
# A way to input labels to show model what the correct answer is during training
y_input = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y_input')
# First convolutional layer - reshape/resize images
# A weight variable that examines batches of 5x5 pixels, returns 32 features (1 feature per bit value in 32 bit float)
W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1')
# Bias variable to add to each of the 32 features
b_conv1 = biases_variable([32], 'b_conv1')
# Reshape each input image into a 28 x 28 x 1 pixel matrix
x_image = tf.reshape(x_input, [-1, 28, 28, 1], name='x_image')
# Flattens filter (W_conv1) to [5 * 5 * 1, 32], multiplies by [None, 28, 28, 1] to associate each 5x5 batch with the
# 32 features, and adds biases
h_conv1 = tf.nn.relu(conv_2d(x_image, W_conv1, name='conv1') + b_conv1, name='h_conv1')
# Takes windows of size 2x2 and computes a reduction on the output of h_conv1 (computes max, used for better prediction)
# Images are reduced to size 14 x 14 for analysis
h_pool1 = max_pool(h_conv1, name='h_pool1')
# Second convolutional layer, reshape/resize images
# Does mostly the same as above but converts each 32 unit output tensor from layer 1 to a 64 feature tensor
W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
b_conv2 = biases_variable([64], 'b_conv2')
h_conv2 = tf.nn.relu(conv_2d(h_pool1, W_conv2, name='conv2') + b_conv2, name='h_conv2')
# Images at this point are reduced to size 7 x 7 for analysis
h_pool2 = max_pool(h_conv2, name='h_pool2')
# First dense layer, performing calculation based on previous layer output
# Each image is 7 x 7 at the end of the previous section and outputs 64 features, we want 32 x 32 neurons = 1024
W_dense1 = weight_variable([7 * 7 * 64, 1024], name='W_dense1')
# bias variable added to each output feature
b_dense1 = biases_variable([1024], name='b_dense1')
# Flatten each of the images into size [None, 7 x 7 x 64]
h_pool_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64], name='h_pool_flat')
# Multiply weights by the outputs of the flatten neuron and add biases
h_dense1 = tf.nn.relu(tf.matmul(h_pool_flat, W_dense1, name='matmul_dense1') + b_dense1, name='h_dense1')
# Dropout layer prevents overfitting or recognizing patterns where none exist
# Depending on what value we enter into keep_prob, it will apply or not apply dropout layer
keep_prob = tf.placeholder(dtype=tf.float32, name='keep_prob')
# Dropout layer will be applied during training but not testing or predicting
h_drop1 = tf.nn.dropout(h_dense1, keep_prob, name='h_drop1')
# Readout layer used to format output
# Weight variable takes inputs from each of the 1024 neurons from before and outputs an array of 10 elements
W_readout1 = weight_variable([1024, 10], name='W_readout1')
# Apply bias to each of the 10 outputs
b_readout1 = biases_variable([10], name='b_readout1')
# Perform final calculation by multiplying each of the neurons from dropout layer by weights and adding biases
y_readout1 = tf.add(tf.matmul(h_drop1, W_readout1, name='matmul_readout1'), b_readout1, name='y_readout1')
# Softmax cross entropy loss function compares expected answers (labels) vs actual answers (logits)
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_input, logits=y_readout1))
# Adam optimizer aims to minimize loss
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy_loss)
# Compare actual vs expected outputs to see if highest number is at the same index, true if they match and false if not
correct_prediction = tf.equal(tf.argmax(y_input, 1), tf.argmax(y_readout1, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Used to save the graph and weights
saver = tf.train.Saver()
# Run in with statement so session only exists within it
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Save the graph shape and node names to pbtxt file
tf.train.write_graph(sess.graph_def, '.', 'advanced_mnist.pbtxt', False)
(x, label) = train_pickle_reader('train.pickle')
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
# Train the model, running through data 20000 times in batches of 50
# Print out step # and accuracy every 100 steps and final accuracy at the end of training
# Train by running train_step and apply dropout by setting keep_prob to 0.5
for i in range(20000):
for j in range(num_batches):
x_batch = x[j * batch_size: (j + 1) * batch_size]
label_batch = label[j * batch_size: (j + 1)*batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
# Save the session with graph shape and node weights
saver.save(sess, 'advanced_mnist.ckpt')
# Make a prediction
(x, labels) = test_pickle_reader('test.pickle')
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))

In your code, after instantiating a tf.Session(), the line batch = mnist_data.train.next_batch(50) calls a built in function which returns a tuple of the kind (input, label). In order to feed the network with your data, here you need to define some function returning i.e. a numpy array having the input data and the associated label. For example, assuming you have a pickle file containing your training data, your code should look something like:
def pikle_reader(filename):
with open(filename, 'r') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
[...]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
[...]
# get your data:
(x, label) = pikle_reader(filename)
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
for i in range(20000): # number of epochs
for j in range(num_batches):
x_batch = x[j*batch_size: (j+1)*batch_size]
label_batch = label[j* batch_size: (j+1)batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
Here, feed_dict feeds the placeholders x_input with the values in x_batch and the placeholder y_input with label_batch. Then in the session the code will run the train_step operation.
Instead, when you want to make a prediction the code is basically the same:
(x, label) = pikle_reader(test_data_filename)
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))

unpack(unstack) an input (placeholder) with one None dimension in tensorflow

I am trying to use LSTM with inputs with different time steps (different number of frames). The input to the rnn.static_rnn should be a sequence of tf (not a tf!). So, I should convert my input to sequence. I tried to use tf.unstack and tf.split, but both of them need to know exact size of inputs, while one dimension of my inputs (time steps) is changing by different inputs. following is part of my code:
n_input = 256*256 # data input (img shape: 256*256)
n_steps = None # timesteps
batch_size = 1
# tf Graph input
x = tf.placeholder("float", [ batch_size , n_input,n_steps])
y = tf.placeholder("float", [batch_size, n_classes])
# Permuting batch_size and n_steps
x1 = tf.transpose(x, [2, 1, 0])
x1 = tf.transpose(x1, [0, 2, 1])
x3=tf.unstack(x1,axis=0)
#or x3 = tf.split(x2, ?, 0)
# Define a lstm cell with tensorflow
lstm_cell = rnn.BasicLSTMCell(num_units=n_hidden, forget_bias=1.0)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x3, dtype=tf.float32,sequence_length=None)
I got following error when I am using tf.unstack:
ValueError: Cannot infer num from shape (?, 1, 65536)
Also, there are some discussions here and here, but none of them were useful for me. Any help is appreciated.

As explained in here, tf.unstack does not work if the argument is unspecified and non-inferrable.
In your code, after transpositions, x1 has the shape of [ n_steps, batch_size, n_input] and its value at axis=0 is set to None.

Tensorflow reusing variables

I'm trying to build a pixel-wise classification LSTM RNN using tensorflow. My model is displayed in the picture below. The problem I'm having is building a 3D LSTM RNN. The code that I have builds a 2D LSTM RNN, so I placed the code inside a loop, but now I get the following error:
ValueError: Variable RNN/BasicLSTMCell/Linear/Matrix does not exist, disallowed. Did you mean to set reuse=None in VarScope?
So here's the network:
The idea goes like this... an input image of size (200,200) is the input into a LSTM RNN of size (200,200,200). Each sequence output from the LSTM tensor vector (the pink boxes in the LSTM RNN) is fed into a MLP, and then the MLP makes a single output prediction -- ergo pixel-wise prediction (you can see how one input pixel generates one output "pixel"
So here's my code:
...
n_input_x = 200
n_input_y = 200
x = tf.placeholder("float", [None, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_input_x, n_input_y])
def RNN(x):
x = tf.transpose(x, [1, 0, 2])
x = tf.reshape(x, [-1, n_input_x])
x = tf.split(0, n_steps, x)
output_matrix = []
for i in xrange(200):
temp_vector = []
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
for j in xrange(200):
lstm_vector = outputs[j]
pixel_pred = multilayer_perceptron(lstm_vector, mlp_weights, mlp_biases)
temp_vector.append(pixel_pred)
output_matrix.append(temp_vector)
print i
return output_matrix
temp = RNN(x)
pred = tf.placeholder(temp, [None, n_input_x, n_input_y])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
...
You can see that I placed the call to RNN inside the first loop. In this way, I generate a new RNN every time. I know Tensorflow auto-increments other Tensors.
debugging I have
(Pdb) lstm_cell
<tensorflow.python.ops.rnn_cell.BasicLSTMCell object at 0x7f9d26956850>
and then for outputs I have a vector of 200 BasicLSTMCells
(Pdb) len(outputs)
200
...
<tf.Tensor 'RNN_2/BasicLSTMCell_199/mul_2:0' shape=(?, 200) dtype=float32>]
So ideally, I want the second call to RNN to generate LSTM tensors with indexes 200-399
I tried this, but it won't construct a RNN because the dimensions of 40000 and x (after the split) don't line up.
x = tf.reshape(x, [-1, n_input_x])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_input_y, x)
lstm_cell = rnn_cell.BasicLSTMCell(40000, forget_bias=1.0, state_is_tuple=True)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
output_matrix = []
for i in xrange(200):
temp_vector = []
for j in xrange(200):
lstm_vector = outputs[i*j]
So then I also tried to get rid of the split, but then it complains that it must be a list. So then I tried reshaping x = tf.reshape(x, [n_input_x * n_input_y]) but then it still says it must be a list

What's the difference between two implementations of RNN in tensorflow?

I find two kinds of implementations of RNN in tensorflow.
The first implementations is this (from line 124 to 129). It uses a loop to define each step of input in RNN.
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
states.append(state)
The second implementation is this (from line 51 to 70). It doesn't use any loop to define each step of input in RNN.
def RNN(_X, _istate, _weights, _biases):
# input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate)
# Linear activation
# Get inner loop last output
return tf.matmul(outputs[-1], _weights['out']) + _biases['out']
In the first implementation, I find there is no weight matrix between input unit to hidden unit, only define weight matrix between hidden unit to out put unit (from line 132 to 133)..
output = tf.reshape(tf.concat(1, outputs), [-1, size])
softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
softmax_b = tf.get_variable("softmax_b", [vocab_size])
logits = tf.matmul(output, softmax_w) + softmax_b
But in the second implementation, both of the weight matrices are defined (from line 42 to 47).
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
I wonder why?

The difference I noticed is that the code in the second implementation uses tf.nn.rnn which takes list of inputs for each time step and generated the list of outputs for each time step.
(Inputs: A length T list of inputs, each a tensor of shape
[batch_size, input_size].)
So, if you check the code in the second implementation on line 62 the input data is shaped into n_steps * (batch_size, n_hidden)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
In the 1st implementation they are looping through the n_time_steps and providing the input and get the corresponding output and storing in the outputs list.
Code snippet from line 113 to 117
outputs = []
state = self._initial_state
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
Coming to your second question:
If you carefully notice the way the inputs are being fed to the RNN in both the implementations.
In the first implementation the inputs are already of shape batch_size x num_steps (here num_steps is hidden size):
self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
Whereas in the second implementation the initial inputs are of shape (batch_size x n_steps x n_input). So a weight matrix is required to transform to the shape (n_steps x batch_size x hidden_size):
# Input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # Permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
I hope this is helpful...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow LSTM - Matrix multiplication on LSTM cell - python

Related

keras input shape: Input incompatible with the layer

Tensorflow - replace MNIST on other dataset

unpack(unstack) an input (placeholder) with one None dimension in tensorflow

Tensorflow reusing variables

What's the difference between two implementations of RNN in tensorflow?

Categories

Resources