I've been trying to get a CNN working for the Omligot dataset (105 x 105 x 1 images) via two tutorials I found: CNN tutorial 1 and CNN tutorial 2, working on the usual MNIST dataset (28 x 28 x 1 images).
I'm still struggling with a shaping conflict in the implementation (after a week with on- & off time for debugging) No one was able to provide any help thus far, and in the mean time I could debug in a way, where I think I can provide better description of the shaping error.
The majority of my code is as follows (just skipping few irrelevant stuff here and there). So my placeholders are defined as follows right:
x = tf.placeholder(tf.float32, shape=(None, 105, 105, 1) ) # placeholder for train data
y = tf.placeholder(tf.float32, shape=(None, 20) ) # placeholder for labels
lr = tf.placeholder(tf.float32,shape=(), name="learnRate") # for varying learning rates during training
label dimension of 20 given that there's 20 different characters per alphabet. So one-hot-encoded vector of length 20.
From here on out my model is defined as follow (where I commented the dimension results from each output):
# weight and bias dimension definitions
self.ConFltSize = 3
self.ConOutSize = 7
self.weights = {
'wc1': tf.Variable(tf.random_normal([self.ConFltSize,self.ConFltSize,1, 32], stddev=0.01, name='W0')),
'wc2': tf.Variable(tf.random_normal([self.ConFltSize,self.ConFltSize,32, 64], stddev=0.01, name='W1')),
'wc3': tf.Variable(tf.random_normal([self.ConFltSize,self.ConFltSize,64, 128], stddev=0.01, name='W2')),
'wd1': tf.Variable(tf.random_normal([self.ConOutSize * self.ConOutSize * 128, 128], stddev=0.01, name='W3')),
'out': tf.Variable(tf.random_normal([128, self.InLabels.shape[1]], stddev=0.01, name='W4')),
}
self.biases = {
'bc1': tf.Variable(tf.random_normal([32], stddev=0.01, name='B0')),
'bc2': tf.Variable(tf.random_normal([64], stddev=0.01, name='B1')),
'bc3': tf.Variable(tf.random_normal([128], stddev=0.01, name='B2')),
'bd1': tf.Variable(tf.random_normal([128], stddev=0.01, name='B3')),
'out': tf.Variable(tf.random_normal([self.InLabels.shape[1]], stddev=0.01, name='B4')),
}
# Model definition + shaping results
# x = provide the input data
# weights = dictionary variables for weights
# biases = dictionary variables for biases
def Architecture(self, x, weights, biases):
conv1 = self.conv(x, weights['wc1'], biases['bc1']) # convolution layer 1
conv1 = self.maxPool(conv1) # max pool layer 1
# out shape -> [None, 53, 53, 32]
conv2 = self.conv(conv1, weights['wc2'], biases['bc2']) # convolution layer 2
conv2 = self.maxPool(conv2) # max pool layer 2
# out shape -> [None, 27, 27, 64]
conv3 = self.conv(conv2, weights['wc3'], biases['bc3']) # convolution layer 3
conv3 = self.maxPool(conv3) # max pool layer 3
# out shape -> [None, 14, 14, 128]
flayer = tf.reshape(conv3, [-1, weights['wd1'].shape[0]]) # flatten the output from convo layer
# for 7 x 7 x 128 this is -> [None, 6272]
flayer = tf.add(tf.matmul(flayer, weights['wd1']), biases['bd1']) # fully connected layer 1
flayer = tf.nn.relu(flayer)
# out shape -> [None, 128]
out = tf.add( tf.matmul(flayer, weights['out']), biases['out'] ) # do last set of output weight * vals + bias
# out shape -> [None, 20]
return out # net input to output layer
But so now, in my main program I feed input data to my model in batches, basically with:
out = self.Architecture(x, self.weights, self.biases) # Implement network architecture, and get output tensor (net input to output layer)
# normalize, softmax and entropy the net input, in comparison with provided labels
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=out, labels=y) ) # cost function
optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr).minimize(cost) # gradient descent optimizer
pred = tf.equal(tf.argmax(out, 1), tf.argmax(y , 1)) # output true / false if predicted value matches label
accuracy = tf.reduce_mean(tf.cast(pred, tf.float32)) # percentage value of correct predictions
for i in range(iters):
[BX, _, BY, _] = batch.split(trainX, trainY, Bsize) # random split in batch size
# data shapes: BX -> [160, 105, 105, 1], BY -> [160, 20]
# Code bombs out after feeding with input data
opt = sess.run(optimizer, feed_dict={lr:learnr, x:BX, y:BY } )
The exception what I then get after with the sess.run command is:
'logits and labels must be broadcastable: logits_size=[640,20] labels_size=[160,20]\n\t [[Node: softmax_cross_entropy_with_logits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Add_1, softmax_cross_entropy_with_logits/Reshape_1)]]'
From this I interpret that the softmax is getting [640, 20] as input while it's expecting [160, 20]... I do not understand how, where the data could be shaped to [640, 20]???
Please show me if I'm missing something, or misinterpreting the error?
Related
According to the tensorflow website, tf.reshape takes a tensor of a certain shape and maps it to a tensor of another shape. I want to map a tensor of size [600, 64] to a tensor of size [-1, 8, 8, 1] (in which the dimension at the -1 position is 600). This doesn't seem to be working though.
I am running this on tensorflow on python 3.6 and although it reshapes to something like [-1, 8, 8], it doesn't reshape to [-1, 8, 8, 1]
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import LabelBinarizer
# preprocessing method needed
def flatten(array):
temp = []
for j in array:
temp.extend(j)
return temp
# preprocess the data
digits = datasets.load_digits()
images = digits.images
images = [flatten(i) for i in images]
labels = digits.target
labels = LabelBinarizer().fit_transform(labels)
# the stats needed
width = 8
height = 8
alpha = 0.1
num_labels = 10
kernel_length = 3
batch_size = 10
channels = 1
# the tensorflow placeholders and reshaping
X = tf.placeholder(tf.float32, shape = [None, width * height * channels])
# AND NOW HERE IS WHERE THE ERROR STARTS
y_true = tf.placeholder(tf.float32, shape = [None, num_labels])
X = tf.reshape(X, [-1, 8, 8, 1])
# the convolutional model
conv1 = tf.layers.conv2d(X, filters = 32, kernel_size = [kernel_length, kernel_length])
conv2 = tf.layers.conv2d(conv1, filters = 64, kernel_size = [2, 2])
flatten = tf.reshape(X, [-1, 1])
dense1 = tf.layers.dense(flatten, units=50, activation = tf.nn.relu)
y_pred = tf.layers.dense(dense1, units=num_labels, activation = tf.nn.softmax)
# the loss and training functions
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
train = tf.train.GradientDescentOptimizer(alpha).minimize(loss)
# initializing the variables and the tf.session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
# running the session
for i in range(batch_size):
_, lossVal = sess.run((train, loss), feed_dict = {X:images[:600], y_true: labels[:600]})
print(lossVal)
I keep on getting this error:
ValueError: Cannot feed value of shape (600, 64) for Tensor 'Reshape:0', which has shape '(?, 8, 8, 1)'
And I feel like that should not be the case since 8 * 8 * 1 does equal 64.
images[:600]'s shape is (600, 64), which does not correspond to the placeholder expected shape, (None, 8, 8, 1).
Either reshape your data or change the shape of the placeholder.
Note that the fact that you originally defined the placeholder shape to be (None, 64) is inconsequential as you reshape it a few lines later.
I have a problem with using diffrent dataset then default from tensorflow.
I have code using MNIST dataset to recognize digits. In this application there is generated graph, which is imported later by android app.
Now I would like to recognize digits and math's operators (basic one: +, -, *, /).
I found script to generate data I need. I have two .pickle files.
But even with the dataset which suits for me, still I don't know how to import this dataset to my app with tensorflow.
I would be grateful for help with this or maybe to give me other (maybe easier) solution.
EDIT
I did some changes in the code which were adviced by gabriele.
Now I have error:
(x, label) = train_pickle_reader('train.pickle')
ValueError: too many values to unpack (expected 2)
I found the description of the dataset I used:
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Below there is code in python:
import tensorflow as tf
import pickle
def train_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
def test_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
# Function to create a weight neuron using a random number. Training will assign a real weight later
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial, name=name)
# Function to create a bias neuron. Bias of 0.1 will help to prevent any 1 neuron from being chosen too often
def biases_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
# Function to create a convolutional neuron. Convolutes input from 4d to 2d. This helps streamline inputs
def conv_2d(x, W, name):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name=name)
# Function to create a neuron to represent the max input. Helps to make the best prediction for what comes next
def max_pool(x, name):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
# A way to input images (as 784 element arrays of pixel values 0 - 1)
x_input = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='x_input')
# A way to input labels to show model what the correct answer is during training
y_input = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y_input')
# First convolutional layer - reshape/resize images
# A weight variable that examines batches of 5x5 pixels, returns 32 features (1 feature per bit value in 32 bit float)
W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1')
# Bias variable to add to each of the 32 features
b_conv1 = biases_variable([32], 'b_conv1')
# Reshape each input image into a 28 x 28 x 1 pixel matrix
x_image = tf.reshape(x_input, [-1, 28, 28, 1], name='x_image')
# Flattens filter (W_conv1) to [5 * 5 * 1, 32], multiplies by [None, 28, 28, 1] to associate each 5x5 batch with the
# 32 features, and adds biases
h_conv1 = tf.nn.relu(conv_2d(x_image, W_conv1, name='conv1') + b_conv1, name='h_conv1')
# Takes windows of size 2x2 and computes a reduction on the output of h_conv1 (computes max, used for better prediction)
# Images are reduced to size 14 x 14 for analysis
h_pool1 = max_pool(h_conv1, name='h_pool1')
# Second convolutional layer, reshape/resize images
# Does mostly the same as above but converts each 32 unit output tensor from layer 1 to a 64 feature tensor
W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
b_conv2 = biases_variable([64], 'b_conv2')
h_conv2 = tf.nn.relu(conv_2d(h_pool1, W_conv2, name='conv2') + b_conv2, name='h_conv2')
# Images at this point are reduced to size 7 x 7 for analysis
h_pool2 = max_pool(h_conv2, name='h_pool2')
# First dense layer, performing calculation based on previous layer output
# Each image is 7 x 7 at the end of the previous section and outputs 64 features, we want 32 x 32 neurons = 1024
W_dense1 = weight_variable([7 * 7 * 64, 1024], name='W_dense1')
# bias variable added to each output feature
b_dense1 = biases_variable([1024], name='b_dense1')
# Flatten each of the images into size [None, 7 x 7 x 64]
h_pool_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64], name='h_pool_flat')
# Multiply weights by the outputs of the flatten neuron and add biases
h_dense1 = tf.nn.relu(tf.matmul(h_pool_flat, W_dense1, name='matmul_dense1') + b_dense1, name='h_dense1')
# Dropout layer prevents overfitting or recognizing patterns where none exist
# Depending on what value we enter into keep_prob, it will apply or not apply dropout layer
keep_prob = tf.placeholder(dtype=tf.float32, name='keep_prob')
# Dropout layer will be applied during training but not testing or predicting
h_drop1 = tf.nn.dropout(h_dense1, keep_prob, name='h_drop1')
# Readout layer used to format output
# Weight variable takes inputs from each of the 1024 neurons from before and outputs an array of 10 elements
W_readout1 = weight_variable([1024, 10], name='W_readout1')
# Apply bias to each of the 10 outputs
b_readout1 = biases_variable([10], name='b_readout1')
# Perform final calculation by multiplying each of the neurons from dropout layer by weights and adding biases
y_readout1 = tf.add(tf.matmul(h_drop1, W_readout1, name='matmul_readout1'), b_readout1, name='y_readout1')
# Softmax cross entropy loss function compares expected answers (labels) vs actual answers (logits)
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_input, logits=y_readout1))
# Adam optimizer aims to minimize loss
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy_loss)
# Compare actual vs expected outputs to see if highest number is at the same index, true if they match and false if not
correct_prediction = tf.equal(tf.argmax(y_input, 1), tf.argmax(y_readout1, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Used to save the graph and weights
saver = tf.train.Saver()
# Run in with statement so session only exists within it
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Save the graph shape and node names to pbtxt file
tf.train.write_graph(sess.graph_def, '.', 'advanced_mnist.pbtxt', False)
(x, label) = train_pickle_reader('train.pickle')
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
# Train the model, running through data 20000 times in batches of 50
# Print out step # and accuracy every 100 steps and final accuracy at the end of training
# Train by running train_step and apply dropout by setting keep_prob to 0.5
for i in range(20000):
for j in range(num_batches):
x_batch = x[j * batch_size: (j + 1) * batch_size]
label_batch = label[j * batch_size: (j + 1)*batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
# Save the session with graph shape and node weights
saver.save(sess, 'advanced_mnist.ckpt')
# Make a prediction
(x, labels) = test_pickle_reader('test.pickle')
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))
In your code, after instantiating a tf.Session(), the line batch = mnist_data.train.next_batch(50) calls a built in function which returns a tuple of the kind (input, label). In order to feed the network with your data, here you need to define some function returning i.e. a numpy array having the input data and the associated label. For example, assuming you have a pickle file containing your training data, your code should look something like:
def pikle_reader(filename):
with open(filename, 'r') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
[...]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
[...]
# get your data:
(x, label) = pikle_reader(filename)
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
for i in range(20000): # number of epochs
for j in range(num_batches):
x_batch = x[j*batch_size: (j+1)*batch_size]
label_batch = label[j* batch_size: (j+1)batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
Here, feed_dict feeds the placeholders x_input with the values in x_batch and the placeholder y_input with label_batch. Then in the session the code will run the train_step operation.
Instead, when you want to make a prediction the code is basically the same:
(x, label) = pikle_reader(test_data_filename)
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))
I wanna use CNN for 1D data, so decide to use conv1d layer. First two layers goes good, but when I create second conv layer, I have and error:
ValueError: Dimensions must be equal, but are 1 and 586 for 'conv1_43/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [586,1,1040,1], [1,5,586,6].
This is my date shape:
trainX = dataX[0:616]
trainY = dataY[0:616]
testX = dataX[616:646]
testY = dataY[616:646]
trainX = np.expand_dims(trainX, axis=2)
testX = np.expand_dims(testX, axis=2)
#final shapes: train:(586,1040,1) test:(30,1040,1)
There's code:
def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
with tf.variable_scope(name) as scope:
# Shape of the filter-weights for the convolution
shape = [filter_size, num_input_channels, num_filters]
# Create new weights (filters) with the given shape
weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))
# Create new biases, one for each filter
#biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
# TensorFlow operation for convolution
layer = tf.nn.conv1d(input, weights, 1, 'SAME')
# Add the biases to the results of the convolution.
#layer += biases
return layer, weights
# Function for creating a new ReLU Layer
def new_relu_layer(input, name):
with tf.variable_scope(name) as scope:
# TensorFlow operation for convolution
layer = tf.nn.relu(input)
return layer
# Convolutional Layer 1
layer_conv1, weights_conv1 = new_conv_layer(trainX, num_input_channels=586, filter_size=5, num_filters=6, name ="conv1")
# Pooling Layer 1new_pool_layer
layer_pool1 = max_pooling1d(layer_conv1, 3, 1, name="pool1")
# RelU layer 1
layer_relu1 = new_relu_layer(layer_pool1, name="relu1")
# Convolutional Layer 2
layer_conv2, weights_conv2 = new_conv_layer(input=layer_relu1, num_input_channels=1, filter_size=5, num_filters=16, name= "conv2")
# Pooling Layer 2
layer_pool2 = max_pooling1d(layer_conv2, 2, 1, name="pool2")
# RelU layer 2
layer_relu2 = new_relu_layer(layer_pool2, name="relu2")
What is a problem?
The input and kernel sizes to the nn.conv1d is not right.
From the API doc,
Input tensor should be of shape: [batch, in_width, num_input_channels]
Kernel/weights should be of shape: [filter_width, num_input_channels,
out_channels]
The inputs are of shape [586,1,1040,1] and should be [586, 1040, 1], and the kernel has the num_input_channels defined wrongly when calling the new_conv_layer. It should be 1 in the first call and 6 in the next call.
I am trying to implement a Fully Convolutional network (5 layers) in TensorFlow.
But after few time of training, all my logits fall to 0.
Did anyone have the same problem before ?
Here is how I implemented my CONV-ReLU-maxPOOL layer :
def conv_relu_layer (in_data, nb_filters, filter_shape) :
nb_in_channels = int (in_data_reshaped.shape[3])
conv_shape = [filter_shape[0], filter_shape[1],
nb_in_channels, nb_filters]
weights = tf.Variable (
tf.truncated_normal (conv_shape, mean=0., stddev=.05))
bias = tf.Variable (
tf.truncated_normal ([nb_filters], mean=0., stddev=1.))
output = tf.nn.conv2d (in_data_reshaped, weights,
[1,1,1,1], padding="SAME")
output += bias
output = tf.nn.relu (output)
return output
def conv_relu_pool_layer (in_data, nb_filters, filter_shape, pool_shape,
pooling=tf.nn.max_pool) :
conv_out = conv_relu_layer (in_data, nb_filters, filter_shape)
ksize = [1, pool_shape[0], pool_shape[1], 1]
strides = [1, pool_shape[0], pool_shape[1], 1]
return pooling (conv_out, ksize=ksize, strides=strides, padding="SAME")
Here is my network :
def create_network_5C (in_data, name="5C") :
c1 = conv_relu_pool_layer (in_data, 64, [5,5], [2,2])
c2 = conv_relu_pool_layer (c1, 128, [5,5], [2,2])
c3 = conv_relu_pool_layer (c2, 256, [5,5], [2,2])
c4 = conv_relu_pool_layer (c3, 64, [5,5], [2,2])
return conv_relu_layer (c4, 2, [5,5])
The loss function :
def loss (logits, labels, num_classes) :
with tf.name_scope('loss'):
logits = tf.reshape(logits, (-1, num_classes))
epsilon = tf.constant(value=1e-4)
labels = tf.to_float(tf.reshape(labels, (-1, num_classes)))
softmax = tf.nn.softmax(logits) + epsilon
cross_entropy = - tf.reduce_sum (
tf.multiply (labels * tf.log (softmax), head),
reduction_indices=[1])
cross_entropy_mean = tf.reduce_mean (cross_entropy)
tf.add_to_collection('losses', cross_entropy_mean)
loss = tf.add_n(tf.get_collection('losses'))
return loss
My main routine :
batch_size = 5
# Load data
x = tf.placeholder (tf.float32, [None, 416, 416, 3], name="x")
y = tf.placeholder (tf.float32, [None, 416, 416, 1], name="y")
# Contrast normalization and computation
x_gcn = tf.map_fn (lambda img : tf.image.per_image_standardization (img), x)
logits = create_network_5C (x_gcn)
# Having label at the same dimension as the output
y_p = tf.nn.avg_pool (tf.sign (y),
ksize=[1,16,16,1], strides=[1,16,16,1], padding="SAME")
y_rshp = tf.reshape (y_p, [batch_size, 416//16, 416//16])
y_bin = tf.cast (y_rshp > .5, tf.int32)
y_1hot = tf.one_hot (y_bin, 2)
# Compute error
error = loss (logits, y_1hot, 2)
optimizer = tf.train.AdamOptimizer (learning_rate=args.eta).minimize (error)
# Run the session
init_op = tf.global_variables_initializer ()
with tf.Session () as session :
session.run (init_op)
err, _ = session.run ([error, optimizer],
feed_dict={ x: image_batch,
y: label_batch })
I note that, if I reduce my network to 2 layers only, it won't drop the logits to 0, but it won't learn anything either. If I reduce it to 3 layers, it will drop to 0, but after a many iterations (while 5 layers drop to 0 in few batches).
Can this be linked to what is called "gradient vanish" ?
If it's relevant, my spec are : Ubuntu 16.04 - Python 3.6.4 - tensorflow 1.6.0
[EDIT] My problem really look like dead-ReLU, as mentioned here : StackOverflow : FCN training error, but my data is normalized (between something like -2 and +2, and I already tried to change the mean and stddev initial value of my weights and biases
[EDIT 2] I tried to replace the ReLUs with Leaky ReLU, or a softplus, in both cases, logits get stucked under 0.1 and loss stay between 0.6 and 0.7
Using some leaky relu was actually enough, then I just needed to let him train for a hudge amount of time.
I find two kinds of implementations of RNN in tensorflow.
The first implementations is this (from line 124 to 129). It uses a loop to define each step of input in RNN.
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
states.append(state)
The second implementation is this (from line 51 to 70). It doesn't use any loop to define each step of input in RNN.
def RNN(_X, _istate, _weights, _biases):
# input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate)
# Linear activation
# Get inner loop last output
return tf.matmul(outputs[-1], _weights['out']) + _biases['out']
In the first implementation, I find there is no weight matrix between input unit to hidden unit, only define weight matrix between hidden unit to out put unit (from line 132 to 133)..
output = tf.reshape(tf.concat(1, outputs), [-1, size])
softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
softmax_b = tf.get_variable("softmax_b", [vocab_size])
logits = tf.matmul(output, softmax_w) + softmax_b
But in the second implementation, both of the weight matrices are defined (from line 42 to 47).
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
I wonder why?
The difference I noticed is that the code in the second implementation uses tf.nn.rnn which takes list of inputs for each time step and generated the list of outputs for each time step.
(Inputs: A length T list of inputs, each a tensor of shape
[batch_size, input_size].)
So, if you check the code in the second implementation on line 62 the input data is shaped into n_steps * (batch_size, n_hidden)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
In the 1st implementation they are looping through the n_time_steps and providing the input and get the corresponding output and storing in the outputs list.
Code snippet from line 113 to 117
outputs = []
state = self._initial_state
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
Coming to your second question:
If you carefully notice the way the inputs are being fed to the RNN in both the implementations.
In the first implementation the inputs are already of shape batch_size x num_steps (here num_steps is hidden size):
self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
Whereas in the second implementation the initial inputs are of shape (batch_size x n_steps x n_input). So a weight matrix is required to transform to the shape (n_steps x batch_size x hidden_size):
# Input shape: (batch_size, n_steps, n_input)
_X = tf.transpose(_X, [1, 0, 2]) # Permute n_steps and batch_size
# Reshape to prepare input to hidden activation
_X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
# Linear activation
_X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)
I hope this is helpful...