I was trying to implement an NN with one hidden layer by using TensorFlow to recognize MNIST handwritten digits. I was using gradient descent method to train the NN. However, it seems that my training toward the NN did not work at all, as the testing accuracy did not change at all during the training process.
Can anyone help me figure out what went wrong?
Here is my code.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
batch_size = 100
n_batch = mnist.train.num_examples // batch_size
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
#First layer of the NN
W1 = tf.Variable(tf.zeros([784,10]))
b1 = tf.Variable(tf.zeros([10]))
out1 = tf.nn.softmax(tf.matmul(x, W1) + b1)
#Second layer of the NN
W2 = tf.Variable(tf.zeros([10,10]))
b2 = tf.Variable(tf.zeros([10]))
prediction = tf.nn.softmax(tf.matmul(out1, W2) + b2)
loss = tf.reduce_mean(tf.square(y - prediction))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
init = tf.global_variables_initializer()
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(prediction, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(init)
for epoch in range(101):
for batch in range(n_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
sess.run(train_step, feed_dict={x:batch_xs, y:batch_ys})
acc = sess.run(accuracy, feed_dict={x:mnist.test.images, y:mnist.test.labels})
print("Iter " + str(epoch) + ", Testing Accuracy " + str(acc))
Do not initialize your model with all zeros. If you do so, it is likely that the gradient at that point (in the parameter space) is also zero. This results in the gradient update to be non existent, thus your parameters will simply not change. To avoid that use random initialization.
i.e.
Change
#First layer of the NN
W1 = tf.Variable(tf.zeros([784,10]))
b1 = tf.Variable(tf.zeros([10]))
out1 = tf.nn.softmax(tf.matmul(x, W1) + b1)
#Second layer of the NN
W2 = tf.Variable(tf.zeros([10,10]))
b2 = tf.Variable(tf.zeros([10]))
to
#First layer of the NN
W1 = tf.Variable(tf.truncated_normal([784,10], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([10], stddev=0.1))
out1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
# out1 = tf.nn.softmax(tf.matmul(x, W1) + b1)
#Second layer of the NN
W2 = tf.Variable(tf.truncated_normal([10,10], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([10],stddev=0.1))
Now the model is able to train. You'll also see that I removed the softmax non linearity from the first layer and substituted it with a sigmoid. I did that because softmax layers impose restrictions to the output: it forces that layer's output to add up to one (that's one reason it's often used in the very last layer: to achieve probability interpretation of the final output) . This restriction caused the model to stop learning at 30% accuracy in a quick test. By using a sigmoid the accuracy reached 89%, a much better performance.
Other examples of non linearities you could have used in intermediate layers could be:
Hyperbolic tangent
ReLU
Related
I have a CNN model for image classification which I have trained over my dataset. The model goes something like this
Convolution
Relu
pooling
Convolution
Relu
Convolution
Relu
pooling
flat
fully connected (FC1)
Relu
fully connected (FC2)
softmax
After training, I want to get the feature vectors for an image that I input to the pre-trained model i.e. I want to get the output of FC1 layer. Is there any way we can get it, I browsed the web but couldn't find anything useful any suggestions would be of great help guys.
Training script
# input
x = tf.placeholder(tf.float32, shape=[None, img_size_h, img_size_w, num_channels], name='x')
# lables
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
y_true_cls = tf.argmax(y_true, axis=1)
y_pred = build_model(x) # Builds model architecture
y_pred_cls = tf.argmax(y_pred, axis=1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_pred, labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.MomentumOptimizer(learn_rate, 0.9, use_locking=False, use_nesterov=True).minimize(cost)
accuracy = tf.reduce_mean(tf.cast(tf.equal(y_pred_cls, y_true_cls), tf.float32))
sess = tf.Session()
sess.run(tf.global_variables_initializer())
tf_saver = tf.train.Saver()
train(num_iteration) # Trains the network and saves the model
sess.close()
Testing script
sess = tf.Session()
tf_saver = tf.train.import_meta_graph('model/model.meta')
tf_saver.restore(sess, tf.train.latest_checkpoint('model'))
x = tf.get_default_graph().get_tensor_by_name('x:0')
y_true = tf.get_default_graph().get_tensor_by_name('y_true:0')
y_true_cls = tf.argmax(y_true, axis=1)
y_pred = tf.get_default_graph().get_tensor_by_name('y_pred:0') # refers to FC2 in the model
y_pred_cls = tf.argmax(y_pred, axis=1)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
images, labels = read_data() # read data for testing
feed_dict_test = {x: images, y_true: labels}
test_acc = sess.run(accuracy, feed_dict=feed_dict_test)
sess.close()
You can just perform sess.run on the right tensor to get the values. First you need the tensor. You can give it a name inside build_model by adding a name argument (which you can do for any tensor), e.g.:
FC1 = tf.add(tf.multiply(Flat, W1), b1, name="FullyConnected1")
Later, you can get the tensor for the fully connected layer and evaluate it:
with tf.Session() as sess:
FC1 = tf.get_default_graph().get_tensor_by_name('FullyConnected1:0')
FC1_values = sess.run(FC1, feed_dict={x: input_img_arr})
(This is assuming there is no other layer called FullyConnected1 in the graph)
I'm trying to implement a 1d convolution neural network in Tensorflow. this is the code which used for creating placeholders, convolution layer and max pooling layer:
import tensorflow as tf
import math
try:
from tqdm import tqdm
except ImportError:
def tqdm(x, *args, **kwargs):
return x
sess = tf.InteractiveSession()
# These will be inputs
## Input pixels, image with one channel (gray)
length=458
x = tf.placeholder("float", [None, length])
# Note that -1 is for reshaping
x_im = tf.reshape(x, [-1,length,1])
## Known labels
# None works during variable creation to be
# unspecified size
y_ = tf.placeholder("float", [None,2])
# Conv layer 1
num_filters1 = 2
winx1 = 3
W1 = tf.Variable(tf.truncated_normal(
[winx1, 1 , num_filters1],
stddev=1./math.sqrt(winx1)))
b1 = tf.Variable(tf.constant(0.1,
shape=[num_filters1]))
# 5 convolution, pad with zeros on edges
xw = tf.nn.conv1d(x_im, W1,
stride=5,
padding='SAME')
h1 = tf.nn.relu(xw + b1)
# 2 Max pooling, no padding on edges
p1 = tf.layers.max_pooling1d(h1, pool_size=2,
strides=1, padding='VALID')
# Conv layer 2
num_filters2 = 2
winx2 = 3
W2 = tf.Variable(tf.truncated_normal(
[winx2, num_filters1, num_filters2],
stddev=1./math.sqrt(winx2)))
b2 = tf.Variable(tf.constant(0.1,
shape=[num_filters2]))
# 3 convolution, pad with zeros on edges
p1w2 = tf.nn.conv1d(p1, W2,
stride=3, padding='SAME')
h1 = tf.nn.relu(p1w2 + b2)
# 2 Max pooling, no padding on edges
p2 = tf.layers.max_pooling1d(h1, pool_size=2,
strides=1, padding='VALID')
# Need to flatten convolutional output
p2_size = np.product(
[s.value for s in p2.get_shape()[1:]])
p2f = tf.reshape(p2, [-1, p2_size ])
# Dense layer
num_hidden = 2
W3 = tf.Variable(tf.truncated_normal(
[p2_size, num_hidden],
stddev=2./math.sqrt(p2_size)))
b3 = tf.Variable(tf.constant(0.2,
shape=[num_hidden]))
h3 = tf.nn.relu(tf.matmul(p2f,W3) + b3)
# Drop out training
keep_prob = tf.placeholder("float")
h3_drop = tf.nn.dropout(h3, keep_prob)
# Output Layer
W4 = tf.Variable(tf.truncated_normal(
[num_hidden, 2],
stddev=1./math.sqrt(num_hidden)))
b4 = tf.Variable(tf.constant(0.1,shape=[2]))
# Just initialize
sess.run(tf.global_variables_initializer())
# Define model
y = tf.nn.softmax(tf.matmul(h3_drop,W4) + b4)
### End model specification, begin training code
After constructing the model, it's time to define loss function as follows:
# Climb on cross-entropy
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=y + 1e-50, labels=y_))
# How we train
train_step = tf.train.GradientDescentOptimizer(
0.01).minimize(cross_entropy)
# Define accuracy
correct_prediction = tf.equal(tf.argmax(y,1),
tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(
correct_prediction, "float"))
But when I try to train the model using following code:
# Actually train
epochs = 10
train_acc = np.zeros(epochs//10)
test_acc = np.zeros(epochs//10)
for i in tqdm(range(epochs), ascii=True):
# Record summary data, and the accuracy
if i % 10 == 0:
# Check accuracy on train set
A = accuracy.eval(feed_dict={x: train,
y_: onehot_train, keep_prob: 1.0})
train_acc[i//10] = A
# And now the validation set
A = accuracy.eval(feed_dict={x: test,
y_: onehot_test, keep_prob: 1.0})
test_acc[i//10] = A
train_step.run(feed_dict={x: train,\
y_: onehot_train, keep_prob: 0.5})
It returns an error:
ValueError: Cannot feed value of shape (7487, 458) for Tensor
'Placeholder_8:0', which has shape '(?, 1, 458)'
I have 7478 (1D) signal which have a 458 length. Can somebody help me?!
you just need to reshape your input
train= np.reshape(train,[-1,length,1])
test= np.reshape(test,[-1,length,1])
And you're good to go!
I new to tensorflow and my code is below:
import tensorflow as tf
logdir="/tmp/mnist_tutorial5/"
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir=logdir+"data",one_hot = True)
tf.reset_default_graph()
sess = tf.Session()
writer = tf.summary.FileWriter(logdir)
def model(input):
w = tf.Variable(tf.truncated_normal([784,10], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[10]), name="B")
act = tf.matmul(input,w) + b
tf.summary.histogram("weights",w)
tf.summary.histogram("biases",b)
tf.summary.histogram("activations",act)
return act
def train():
x = tf.placeholder(tf.float32, shape=[None, 784], name="input_img")
y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")
my_label = model(x)
print("linear_regression is completed")
mean_error = tf.reduce_mean(tf.reduce_sum(tf.square(my_label-y)))
tf.summary.scalar("loss", mean_error)
train_step=tf.train.GradientDescentOptimizer(0.0003).minimize(mean_error)
correct_prediction = tf.equal(tf.argmax(my_label, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("accuracy", accuracy)
sess.run(tf.global_variables_initializer())
summ = tf.summary.merge_all()
for i in range(2000):
batch = mnist.train.next_batch(100)
train_accuracy = sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})
print("%s th iteration"%i)
if i%500==0:
print("over 2")
summarys = sess.run(summ, {x: batch[0], y: batch[1]})##i'm getting error here
print("over 3")
writer.add_summary(summarys,i)
print("one over")
train()
writer.add_graph(sess.graph)
And this is the error I am getting:
InvalidArgumentError (see above for traceback): Nan in summary histogram for: weights
[[Node: weights = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](weights/tag, W/read)]]
first of all its just a one layer network with no hidden layer
you have not applied any kind of activation function
no activation means your outputs are not being squashed
your gradients are getting exploded because of which the weights during updation are being exploded which may result in nan
You are training for 2000 epochs which means obviously after few epochs the weights will be nan
try to use an activation function like sigmoid and add atleast one hidden layer and you will be fine
also reduce the number of epochs ... For mnist classification you do not need the model to train for 2000 epochs ... Waste of time
I built a CNN using TensorFlow. The network worked fine, but I had a problem: I couldn't visualize and plot graphs from the learning process.
Therefore I implemented the necessary commands in order to use TensorBoard, following this tutorial.
However, when I run the code I get the following error message:
AttributeError: 'module' object has no attribute 'scalar'
Referring to the following commands (specific to the lines with the **):
in the main function:
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size,
with tf.name_scope('weights'):
**variable_summaries(W_conv1)**
in variable_summaries function:
def variable_summaries(var):
with tf.name_scope('summaries'):
mean = tf.reduce_mean(var)
**tf.summary.scalar('mean', mean)**
What is this error message? I followed the tutorial step by step and I couldn't find the mistake.
Appreciate your help, thanks! :)
The whole code:
import build_database_tuple
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
# few functions to initialize the weights of the layers properly (positive etc.)
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
# convolution and pooling layers definition
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def variable_summaries(var):
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
with tf.name_scope('summaries'):
mean = tf.reduce_mean(var)
tf.summary.scalar('mean', mean)
with tf.name_scope('stddev'):
stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
tf.summary.scalar('stddev', stddev)
tf.summary.scalar('max', tf.reduce_max(var))
tf.summary.scalar('min', tf.reduce_min(var))
tf.summary.histogram('histogram', var)
# from the previous code (mnist):
print('START')
# INTIAL PARAMETERS
# database:
data_home_dir='/home/dir/to/data/'
validation_ratio=(1.0/8)
patch_size=32
test_images_num=5000*1 # csv_batchsize*number of test batches files
train_images_num=78000+78000-test_images_num # posnum + negnum
# model parameters:
first_conv_kernel_size=5
first_conv_output_channels=32
sec_conv_kernel_size=5
sec_conv_output_channels=64
fc_vec_size=512
# train and test parameters
train_epoches_num=5
train_batch_size=100
test_batch_size=100
learning_rate=1*(10**(-4))
summaries_dir='/dir/to/log/files/'
# load data
folds = build_database_tuple.load_data(data_home_dir=data_home_dir,validation_ratio=validation_ratio,patch_size=patch_size)
# starting the session. using the InteractiveSession we avoid build the entiee comp. graph before starting the session
sess = tf.InteractiveSession()
# start building the computational graph
# the 'None' indicates the number of classes - a value that we wanna leave open for now
x = tf.placeholder(tf.float32, shape=[None, patch_size**2]) #input images - 28x28=784
y_ = tf.placeholder(tf.float32, shape=[None, 2]) #output classes (using one-hot vectors)
# the vriables for the linear layer
W = tf.Variable(tf.zeros([(patch_size**2),2])) #weights - 784 input features and 10 outputs
b = tf.Variable(tf.zeros([2])) #biases - 10 classes
# initialize all the variables using the session, in order they could be used in it
sess.run(tf.initialize_all_variables())
# implementation of the regression model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Done!
# FIRST LAYER:
with tf.name_scope('layer1'):
# build the first layer
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
b_conv1 = bias_variable([first_conv_output_channels])
x_image = tf.reshape(x, [-1,patch_size,patch_size,1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel
# apply the layers
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
with tf.name_scope('weights'):
variable_summaries(W_conv1)
with tf.name_scope('biases'):
variable_summaries(b_conv1)
# SECOND LAYER:
with tf.name_scope('layer2'):
# 64 features each 5x5 patch
W_conv2 = weight_variable([sec_conv_kernel_size, sec_conv_kernel_size, patch_size, sec_conv_output_channels])
b_conv2 = bias_variable([sec_conv_output_channels])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
with tf.name_scope('weights'):
variable_summaries(W_conv2)
with tf.name_scope('biases'):
variable_summaries(b_conv2)
# FULLY CONNECTED LAYER:
with tf.name_scope('fc'):
# 1024 neurons, 8x8 - new size after 2 pooling layers
W_fc1 = weight_variable([(patch_size/4) * (patch_size/4) * sec_conv_output_channels, fc_vec_size])
b_fc1 = bias_variable([fc_vec_size])
h_pool2_flat = tf.reshape(h_pool2, [-1, (patch_size/4) * (patch_size/4) * sec_conv_output_channels])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# dropout layer - meant to reduce over-fitting
with tf.name_scope('dropout'):
keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar('dropout_keep_probability', keep_prob)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
with tf.name_scope('weights'):
variable_summaries(W_fc1)
with tf.name_scope('biases'):
variable_summaries(b_fc1)
# READOUT LAYER:
with tf.name_scope('softmax'):
# softmax regression
W_fc2 = weight_variable([fc_vec_size, 2])
b_fc2 = bias_variable([2])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
with tf.name_scope('weights'):
variable_summaries(W_fc2)
with tf.name_scope('biases'):
variable_summaries(b_fc2)
# TRAIN AND EVALUATION:
with tf.name_scope('cross_entropy'):
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1])) # can be numerically unstable. old working calculation
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
tf.summary.scalar('cross_entropy', cross_entropy)
with tf.name_scope('train'):
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
# Merge all the summaries and write them out to /tmp/mnist_logs (by default)
merged = tf.summary.merge_all()
train_writer = tf.train.SummaryWriter(summaries_dir + '/train', sess.graph)
test_writer = tf.train.SummaryWriter(summaries_dir + '/test')
#tf.global_variables_initializer().run()
sess.run(tf.initialize_all_variables())
# variables for the plotting process
p11 = []
p12 = []
p21 = []
p22 = []
f0 = plt.figure()
f1 = plt.figure()
train_accuracy=0
# starting the training process
for i in range(((train_images_num*train_epoches_num)/train_batch_size)):
if i%50 == 0: # for every 100 iterations
#train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
# calculate test accuracy
val_batch = folds.validation.next_batch(train_batch_size)
#val_accuracy = accuracy.eval(feed_dict={x: val_batch[0], y_: val_batch[1], keep_prob: 1.0})
summary, val_accuracy = sess.run([merged, accuracy], feed_dict={x: val_batch[0], y_: val_batch[1], keep_prob: 1.0})
test_writer.add_summary(summary, i)
print('Accuracy at step %s: %s' % (i, val_accuracy))
# The train step
else:
summary, _ = sess.run([merged, train_step], feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
train_writer.add_summary(summary, i)
# Save Network
saver = tf.train.Saver()
save_path = saver.save(sess,'/dir/to/model/files/model.ckpt')
print("Model saved in file: %s" % save_path)
Following the comment of sunside, I updated my tensorflow version and the problem solved.
Apparently, tf.scalar_summary() worked at tensorflow version 0.10, but updated to tf.summary.scalar() at newer versions (0.12, at least).
pip install -U tensorflow in the terminal solved the problem immediately :)
I was going through the tutorial of tensorflow-
https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10])) #weights
b = tf.Variable(tf.zeros([10])) #bias
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Towards the very end, we pass in test data to the placeholders. y_ is matrix containing true values. and y is the matrix with predicted values. My question is when is y computed for the test data. The W matrix has been trained by backpropagation. But this trained matrix must be multiplied with new input x (test data) to give the prediction y. Where does this happen?
Normally i have seen sequential execution of code, and in the last few lines, y isn't called explicitly.
accuracy depends on correct_prediction which depends on y.
So when you call sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}), y is computed before accuracy is computed. All this happen inside the TensorFlow graph.
The TensorFlow graph is the same for train and test. The only difference is the data you feed to the placeholders x and y_.
y is computed here:
y = tf.nn.softmax(tf.matmul(x, W) + b) # Line 7
specifically what you are looking for is with in that line:
tf.matmul(x, W) + b
the output of which is put through the softmax function to identify the class.
This is computed in each of the 1000 passes through the graph, each time the variables W, and b are updated by GradientDescent and y is computed and compared against y_ to determine the loss.